[BUG] Silent bug in Eval Agent

## Describe the bug
There exists in a silent bug in eval agent when we try to do eval of models. It might not be triggered for every script, but it can be quite annoying to deal. WDYT?


## What I did

Test script:


```py
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from pruna import smash, SmashConfig
from pruna.data.pruna_datamodule import PrunaDataModule
from pruna.evaluation.evaluation_agent import EvaluationAgent
from pruna.evaluation.task import Task
from pruna.evaluation.metrics import (
    TotalTimeMetric,
    LatencyMetric,
    ThroughputMetric,
    TotalParamsMetric,
    TotalMACsMetric,
)

os.environ["TOKENIZERS_PARALLELISM"] = "false"

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load model
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B").to(device)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Configure quantization
smash_config = SmashConfig(device=device)
smash_config["quantizer"] = "hqq"
smash_config["hqq_weight_bits"] = 4
smash_config["hqq_compute_dtype"] = "torch.bfloat16"
smash_config["compiler"] = "torch_compile"
smash_config["torch_compile_fullgraph"] = True
smash_config["torch_compile_dynamic"] = True
smash_config["torch_compile_mode"] = "max-autotune"

# Smash model
smashed_model = smash(model, smash_config)

# Setup evaluation
datamodule = PrunaDataModule.from_string(
    dataset_name="WikiText",
    tokenizer=tokenizer,
    collate_fn_args={"max_seq_len": 512},
    dataloader_args={"batch_size": 8, "num_workers": 0},
)
datamodule.limit_datasets(5)

# Create metrics and evaluate
metrics = [
    TotalTimeMetric(),
    LatencyMetric(),
    ThroughputMetric(),
    TotalParamsMetric(),
    TotalMACsMetric(),
]

task = Task(metrics, datamodule=datamodule)
eval_agent = EvaluationAgent(task)

# Run evaluation - bug appears on script exit after this
results = eval_agent.evaluate(smashed_model)

print(f"Evaluation complete: {len(results)} metrics")
# Bug appears here when Python exits
```

Traceback:

```
WARNING - Argument cache_dir not found in config file. Skipping...
INFO - Could not load HQQ model using pipeline, trying generic HQQ pipeline...
INFO - Using best available device: 'cuda'
WARNING - Argument cache_dir not found in config file. Skipping...
100%|████████████████████████████████| 111/111 [00:00<00:00, 40618.37it/s]
  0%|                                        | 0/253 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/pruna/evaluation/evaluation_agent.py", line 109, in evaluate
    results.extend(self.compute_stateless_metrics(model, stateless_metrics))
  File "/pruna/evaluation/evaluation_agent.py", line 276, in compute_stateless_metrics
    results.append(metric.compute(model, self.task.dataloader))
  File "/pruna/evaluation/metrics/metric_memory.py", line 386, in compute
    return self.metric.compute(model, dataloader)
  File "/pruna/evaluation/metrics/metric_memory.py", line 154, in compute
    metric_model = self._load_and_prepare_model(str(save_path), model_cls)
  File "/pruna/evaluation/metrics/metric_memory.py", line 327, in _load_and_prepare_model
    model = model_cls.from_pretrained(model_path)
  File "/pruna/telemetry/metrics.py", line 218, in wrapper
    result = func(*args, **kwargs)
  File "/pruna/engine/pruna_model.py", line 367, in from_pretrained
    model, smash_config = load_pruna_model(model_source, **kwargs)
  File "/pruna/engine/load.py", line 75, in load_pruna_model
    model = LOAD_FUNCTIONS[smash_config.load_fns[0]](model_path, smash_config, **kwargs)
  File "/pruna/engine/load.py", line 568, in __call__
    return self.value(*args, **kwargs)
  File "/pruna/engine/load.py", line 398, in load_hqq
    quantized_model = algorithm_packages["AutoHQQHFModel"].from_quantized(...)
  [... HQQ loading details ...]
NotImplementedError: Cannot copy out of meta tensor; no data!

Exception ignored in: <function SmashConfig.__del__ at 0x715001cdff40>
Traceback (most recent call last):
  File "/pruna/config/smash_config.py", line 122, in __del__
  File "/pruna/config/smash_config.py", line 141, in cleanup_cache_dir
  File "/python3.10/pathlib.py", line 1290, in exists
TypeError: 'NoneType' object is not callable
```
## Expected behavior

Model Evaluation should be completed without error.


## Environment
 - pruna version: 0.2.10
 - python version: 3.11
 - Operating System: 5.15.0-1084-aws-x86_64-with-glibc2.31



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Silent bug in Eval Agent #408

Describe the bug

What I did

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Silent bug in Eval Agent #408

Description

Describe the bug

What I did

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions