Skip to content

Pretraining the deberta-v3 by larger context length. #153

@sherlcok314159

Description

@sherlcok314159

Hi! I find that Deberta-v3 uses relative-position embedding so that it can takes in larger context compared to traditional BERT. Have you tried to pretrain deberta-v3 by 1024 or larger?

If I need to pretrain deberta-v3 from the scratch using a larger context length (e.g., 1024), are there any modification I should make besides the training script?

Thanks for any kind help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions