Skip to content

FSDP: GPT2LMHeadModel object has no attribute model #57

@daire-byrne

Description

@daire-byrne

First of all, thank you so much for putting together these tutorials! I am slowly working through them and trying to better understand how it all fits together.

I have the DDP example working well on my 4 x L40S single node server, but I can't seem to get the FSDP example to work on a single node (maybe that is my problem?).

# torchrun --nproc-per-node gpu train_llm.py -d tatsu-lab/alpaca -m openai-community/gpt2 --cpu-offload
[rank1]: Traceback (most recent call last):
[rank1]:   File "/workspace/distributed-training-guide/04-fully-sharded-data-parallel/train_llm.py", line 389, in <module>
[rank1]:     main()
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
[rank1]:     return f(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^
[rank1]:   File "/workspace/distributed-training-guide/04-fully-sharded-data-parallel/train_llm.py", line 88, in main
[rank1]:     for decoder in model.model.layers:
[rank1]:                    ^^^^^^^^^^^
[rank1]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
[rank1]:     raise AttributeError(
[rank1]: AttributeError: 'GPT2LMHeadModel' object has no attribute 'model'

I haven't got it as far as testing multi node yet, but that was my next step.

I think I have all the correct requirements (transformers=4.57.0) but the pytorch version is the somewhat customised 2.8.0 version that comes in the nvidia pytorch container (2.8.0a0+5228986c39.nv25.06).

The model and dataset are downloaded, cached and verified working with the DDP example (even "offline").

Off topic: the inclusion of the llama-405b tutorial is great, but I doubt many people will be able to run that! It would be awesome if you could include a more modest larger training example too (>gpt2). For people with a few nodes of dual or quad 48G GPUs for example.

Anyway, thanks again for putting this together, much appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions