-
Notifications
You must be signed in to change notification settings - Fork 381
Description
Hello!
I built a docker container from your Dockerfile, then converted it to Singularity to run on an HPC system.
When attempting to run
python -m evo2.test.test_evo2_generation --model_name evo2_7b
from within the container, I get the error:
AssertionError: libcuda.so cannot found!
For anyone using the container to avoid installing the dependencies, this library should be added as well. I've pasted the full error traceback below.
Thanks!
Traceback (most recent call last): File "<frozen runpy>", line 189, in _run_module_as_main File "<frozen runpy>", line 112, in _get_module_details File "/usr/local/lib/python3.12/dist-packages/evo2/__init__.py", line 1, in <module> from .models import Evo2 File "/usr/local/lib/python3.12/dist-packages/evo2/models.py", line 12, in <module> from vortex.model.model import StripedHyena File "/usr/local/lib/python3.12/dist-packages/vortex/model/model.py", line 15, in <module> from vortex.model.layers import ( File "/usr/local/lib/python3.12/dist-packages/vortex/model/layers.py", line 10, in <module> from transformer_engine.pytorch import Linear File "/usr/local/lib/python3.12/dist-packages/transformer_engine/__init__.py", line 13, in <module> from . import pytorch File "/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/__init__.py", line 95, in <module> from transformer_engine.pytorch.permutation import ( File "/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/permutation.py", line 11, in <module> import transformer_engine.pytorch.triton.permutation as triton_permutation File "/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/triton/permutation.py", line 158, in <module> _permute_kernel = triton.autotune( ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/runtime/autotuner.py", line 368, in decorator return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/runtime/autotuner.py", line 130, in __init__ self.do_bench = driver.active.get_benchmarker() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 23, in __getattr__ self._initialize_obj() File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 20, in _initialize_obj self._obj = self._init_fn() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/runtime/driver.py", line 9, in _create_driver return actives[0]() ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 450, in __init__ self.utils = CudaUtils() # TODO: make static ^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 80, in __init__ mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries) ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 45, in library_dirs return [libdevice_dir, *libcuda_dirs()] ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/driver.py", line 39, in libcuda_dirs assert any(os.path.exists(os.path.join(path, 'libcuda.so.1')) for path in dirs), msg ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: libcuda.so cannot found! Possible files are located at ['/usr/local/cuda/compat/lib/libcuda.so.1'].Please create a symlink of libcuda.so to any of the files.