-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Hi,
thanks for sharing the code. I have tryed to use your repo using bitsandbytes for model quantization. Unfortunately, the training process does not work: the layers defined in modelling_llama.py as
self.dropout = nn.Dropout(classifier_dropout)
self.classifier = nn.Linear(config.hidden_size, config.num_labels)
do not get trained, and after finetuning they contain only nanvalues. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?
Metadata
Metadata
Assignees
Labels
No labels