-
Notifications
You must be signed in to change notification settings - Fork 9
Description
What would you like to see added?
Users sometimes build or run software directly on Cheaha compute nodes that writes temporary files to /tmp. Since /tmp is node-local storage and limited in size, files are automatically deleted when the job ends or the node reboots. This can cause jobs to fail with errors like: No space left on device
Proposal
Create a response template for users whose jobs fill the /tmp directory on compute nodes.
Draft response
We noticed that some of your jobs are filling up space in the /tmp directory on Cheaha compute nodes, creating temporary files with names similar to x. From the jobs you submitted today, it looks like you are running an array job (jobid: xxx). Based on the job name xyz, it appears you are running a xy workflow.
Quick Note: /tmp is local storage on each compute node and is limited in size. If it fills up, jobs can fail with “No space left on device” errors because many programs use it for temporary files during processing. Redirecting temp files to larger storage avoids these issues.
Could you share your job script? We can guide you on redirecting temporary files to your home directory or scratch space to prevent /tmp from filling up. For more details, please see our documentation on temporary file issues: https://docs.rc.uab.edu/data_management/cheaha_storage_gpfs/temporary_files/
. You can also join our Zoom office hours: https://docs.rc.uab.edu/#how-to-contact-us
, or I can schedule a meeting to go over it with you.
Where to Include / Update
Can we have a new page HPC Best Practices?