-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
I've noticed that there's no implementation of the new architecture feature (DSA, sparse attn for Deepseek v3.2) in the main simulation procedure.
Though there is "index_topk" in model_config (but without index_head_dim, index_n_heads, etc...). Also a kernel_sim program for DSA (sparse_mla_fp8.py) and some csv result files. But it seems unrelated to the main and model code.
Is the DSA feature in roadmap?
Metadata
Metadata
Assignees
Labels
No labels