-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
There exists in a silent bug in eval agent when we try to do eval of models. It might not be triggered for every script, but it can be quite annoying to deal. WDYT?
What I did
Test script:
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from pruna import smash, SmashConfig
from pruna.data.pruna_datamodule import PrunaDataModule
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task
from pruna.evaluation.metrics import (
TotalTimeMetric,
LatencyMetric,
ThroughputMetric,
TotalParamsMetric,
TotalMACsMetric,
)
os.environ["TOKENIZERS_PARALLELISM"] = "false"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B").to(device)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Configure quantization
smash_config = SmashConfig(device=device)
smash_config["quantizer"] = "hqq"
smash_config["hqq_weight_bits"] = 4
smash_config["hqq_compute_dtype"] = "torch.bfloat16"
smash_config["compiler"] = "torch_compile"
smash_config["torch_compile_fullgraph"] = True
smash_config["torch_compile_dynamic"] = True
smash_config["torch_compile_mode"] = "max-autotune"
# Smash model
smashed_model = smash(model, smash_config)
# Setup evaluation
datamodule = PrunaDataModule.from_string(
dataset_name="WikiText",
tokenizer=tokenizer,
collate_fn_args={"max_seq_len": 512},
dataloader_args={"batch_size": 8, "num_workers": 0},
)
datamodule.limit_datasets(5)
# Create metrics and evaluate
metrics = [
TotalTimeMetric(),
LatencyMetric(),
ThroughputMetric(),
TotalParamsMetric(),
TotalMACsMetric(),
]
task = Task(metrics, datamodule=datamodule)
eval_agent = EvaluationAgent(task)
# Run evaluation - bug appears on script exit after this
results = eval_agent.evaluate(smashed_model)
print(f"Evaluation complete: {len(results)} metrics")
# Bug appears here when Python exitsTraceback:
WARNING - Argument cache_dir not found in config file. Skipping...
INFO - Could not load HQQ model using pipeline, trying generic HQQ pipeline...
INFO - Using best available device: 'cuda'
WARNING - Argument cache_dir not found in config file. Skipping...
100%|████████████████████████████████| 111/111 [00:00<00:00, 40618.37it/s]
0%| | 0/253 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/pruna/evaluation/evaluation_agent.py", line 109, in evaluate
results.extend(self.compute_stateless_metrics(model, stateless_metrics))
File "/pruna/evaluation/evaluation_agent.py", line 276, in compute_stateless_metrics
results.append(metric.compute(model, self.task.dataloader))
File "/pruna/evaluation/metrics/metric_memory.py", line 386, in compute
return self.metric.compute(model, dataloader)
File "/pruna/evaluation/metrics/metric_memory.py", line 154, in compute
metric_model = self._load_and_prepare_model(str(save_path), model_cls)
File "/pruna/evaluation/metrics/metric_memory.py", line 327, in _load_and_prepare_model
model = model_cls.from_pretrained(model_path)
File "/pruna/telemetry/metrics.py", line 218, in wrapper
result = func(*args, **kwargs)
File "/pruna/engine/pruna_model.py", line 367, in from_pretrained
model, smash_config = load_pruna_model(model_source, **kwargs)
File "/pruna/engine/load.py", line 75, in load_pruna_model
model = LOAD_FUNCTIONS[smash_config.load_fns[0]](model_path, smash_config, **kwargs)
File "/pruna/engine/load.py", line 568, in __call__
return self.value(*args, **kwargs)
File "/pruna/engine/load.py", line 398, in load_hqq
quantized_model = algorithm_packages["AutoHQQHFModel"].from_quantized(...)
[... HQQ loading details ...]
NotImplementedError: Cannot copy out of meta tensor; no data!
Exception ignored in: <function SmashConfig.__del__ at 0x715001cdff40>
Traceback (most recent call last):
File "/pruna/config/smash_config.py", line 122, in __del__
File "/pruna/config/smash_config.py", line 141, in cleanup_cache_dir
File "/python3.10/pathlib.py", line 1290, in exists
TypeError: 'NoneType' object is not callable
Expected behavior
Model Evaluation should be completed without error.
Environment
- pruna version: 0.2.10
- python version: 3.11
- Operating System: 5.15.0-1084-aws-x86_64-with-glibc2.31
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working