Skip to content

Commit d00a90b

Browse files
committed
Merge branch 'main' of github.com:EESSI/software-layer-scripts into zstd_tarballs
2 parents 55aacf6 + f2a06d0 commit d00a90b

File tree

9 files changed

+157
-245
lines changed

9 files changed

+157
-245
lines changed

EESSI-extend-easybuild.eb

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ modextravars = {
7070
# EASYBUILD_INSTALLPATH=${EESSI_PREFIX}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR}
7171
# EASYBUILD_SOURCEPATH=${WORKDIR}/easybuild/sources:${EESSI_SOURCEPATH}
7272
#
73-
# And also some optional ones based on the kind of installation
73+
# And also some optional ones based on the installation mode
7474
# EASYBUILD_SET_GID_BIT
7575
# EASYBUILD_GROUP_WRITABLE_INSTALLDIR
7676
# EASYBUILD_UMASK
@@ -212,7 +212,11 @@ easybuild_version = os.getenv("EBVERSIONEASYBUILD") or easybuild_version
212212
eessi_version = os.getenv("EESSI_VERSION") or "2023.06"
213213
214214
-- Set environment variables that are EasyBuild version specific
215-
if convertToCanonical(easybuild_version) > convertToCanonical("4") then
215+
-- Do unload unconditionally, so that even if EB versions were switched in the meantime, this gets unset
216+
-- This avoids issues where EESSI-extend is first loaded with EB => 5.1 (which set these vars)
217+
-- but then EB is swapped for a version < 5.1 and then EESSI-extend is unloaded (which would not unset
218+
-- these vars if we did it conditional on the EB version)
219+
if convertToCanonical(easybuild_version) >= convertToCanonical("5.1") or mode() == "unload" then
216220
setenv ("EASYBUILD_STRICT_RPATH_SANITY_CHECK", "1")
217221
setenv ("EASYBUILD_CUDA_SANITY_CHECK_ERROR_ON_FAILED_CHECKS", "1")
218222
setenv ("EASYBUILD_FAIL_ON_MOD_FILES_GCCCORE", "1")

EESSI-install-software.sh

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,29 @@ else
150150
# make sure the the software and modules directory exist
151151
# (since it's expected by init/eessi_environment_variables when using archdetect and by the EESSI module)
152152
mkdir -p ${EESSI_PREFIX}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR_OVERRIDE}/{modules,software}
153+
154+
# If EESSI_ACCELERATOR_TARGET_OVERRIDE is defined, we are building for an accelerator target
155+
# In that case, make sure the modulepath for the accelerator subdir exists, otherwise the EESSI module will not
156+
# set EESSI_ACCELERATOR_TARGET and the if-condition later in this script which checks if EESSI_ACCELERATOR_TARGET
157+
# is equal to EESSI_ACCELERATOR_TARGET_OVERRIDE will fail
158+
# See https://github.com/EESSI/software-layer-scripts/pull/59#issuecomment-3173593882
159+
if [ -n $EESSI_ACCELERATOR_TARGET_OVERRIDE ]; then
160+
# Note that ${EESSI_PREFIX}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR_OVERRIDE}/${EESSI_ACCELERATOR_TARGET_OVERRIDE}/modules/all
161+
# is only the correct path if EESSI_ACCEL_SOFTWARE_SUBDIR_OVERRIDE is not set
162+
if [ -z $EESSI_ACCEL_SOFTWARE_SUBDIR_OVERRIDE ]; then
163+
mkdir -p ${EESSI_PREFIX}/software/${EESSI_OS_TYPE}/${EESSI_SOFTWARE_SUBDIR_OVERRIDE}/${EESSI_ACCELERATOR_TARGET_OVERRIDE}/modules/all
164+
else
165+
# At runtime, one might want to use a different CPU subdir for a given accelerator. E.g. one could use
166+
# a zen2 CPU subdir on a zen4 node if the required GPU software isn't available in the zen4 tree.
167+
# At build time, this doesn't make a lot of sense: we'd probably build in a CPU prefix that is different
168+
# from what the code will be optimized for, and we wouldn't want that
169+
# So this message _should_ never be printed...
170+
msg="When building the software subdirectory for the CPU should almost certainly be that of the host."
171+
msg="$msg If you think this is incorrect, please implement behaviour that makes sense in "
172+
msg="$msg EESSI-software-installation.sh, essentially replacing this error."
173+
fatal_error "$msg"
174+
fi
175+
fi
153176
)
154177
fi
155178

@@ -294,6 +317,7 @@ source $TOPDIR/load_eessi_extend_module.sh ${EESSI_VERSION}
294317
echo "DEBUG: after loading EESSI-extend // EASYBUILD_INSTALLPATH='${EASYBUILD_INSTALLPATH}'"
295318

296319
# Install full CUDA SDK and cu* libraries in host_injections
320+
# (This is done *before* configuring EasyBuild as it may rely on an older EB version)
297321
# Hardcode this for now, see if it works
298322
# TODO: We should make a nice yaml and loop over all CUDA versions in that yaml to figure out what to install
299323
# Allow skipping CUDA SDK install in e.g. CI environments
@@ -315,6 +339,7 @@ if nvidia_gpu_available; then
315339
${EESSI_PREFIX}/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
316340
fi
317341

342+
318343
if [ ! -z "${shared_fs_path}" ]; then
319344
shared_eb_sourcepath=${shared_fs_path}/easybuild/sources
320345
echo ">> Using ${shared_eb_sourcepath} as shared EasyBuild source path"
@@ -323,7 +348,7 @@ fi
323348

324349
# if an accelerator target is specified, we need to make sure that the CPU-only modules are also still available
325350
if [ ! -z ${EESSI_ACCELERATOR_TARGET} ]; then
326-
CPU_ONLY_MODULES_PATH=$(echo $EASYBUILD_INSTALLPATH | sed "s@/accel/${EESSI_ACCELERATOR_TARGET}@@g")/modules/all
351+
CPU_ONLY_MODULES_PATH=$(echo $EASYBUILD_INSTALLPATH | sed "s@/${EESSI_ACCELERATOR_TARGET}@@g")/modules/all
327352
if [ -d ${CPU_ONLY_MODULES_PATH} ]; then
328353
module use ${CPU_ONLY_MODULES_PATH}
329354
else
@@ -414,7 +439,7 @@ lmod_rc_file="$LMOD_CONFIG_DIR/lmodrc.lua"
414439
echo "DEBUG: lmod_rc_file='${lmod_rc_file}'"
415440
if [[ ! -z ${EESSI_ACCELERATOR_TARGET} ]]; then
416441
# EESSI_ACCELERATOR_TARGET is set, so let's remove the accelerator path from $lmod_rc_file
417-
lmod_rc_file=$(echo ${lmod_rc_file} | sed "s@/accel/${EESSI_ACCELERATOR_TARGET}@@")
442+
lmod_rc_file=$(echo ${lmod_rc_file} | sed "s@/${EESSI_ACCELERATOR_TARGET}@@")
418443
echo "Path to lmodrc.lua changed to '${lmod_rc_file}'"
419444
fi
420445
lmodrc_changed=$(cat ${pr_diff} | grep '^+++' | cut -f2 -d' ' | sed 's@^[a-z]/@@g' | grep '^create_lmodrc.py$' > /dev/null; echo $?)
@@ -427,7 +452,7 @@ fi
427452
lmod_sitepackage_file="$LMOD_PACKAGE_PATH/SitePackage.lua"
428453
if [[ ! -z ${EESSI_ACCELERATOR_TARGET} ]]; then
429454
# EESSI_ACCELERATOR_TARGET is set, so let's remove the accelerator path from $lmod_sitepackage_file
430-
lmod_sitepackage_file=$(echo ${lmod_sitepackage_file} | sed "s@/accel/${EESSI_ACCELERATOR_TARGET}@@")
455+
lmod_sitepackage_file=$(echo ${lmod_sitepackage_file} | sed "s@/${EESSI_ACCELERATOR_TARGET}@@")
431456
echo "Path to SitePackage.lua changed to '${lmod_sitepackage_file}'"
432457
fi
433458
sitepackage_changed=$(cat ${pr_diff} | grep '^+++' | cut -f2 -d' ' | sed 's@^[a-z]/@@g' | grep '^create_lmodsitepackage.py$' > /dev/null; echo $?)

bot/build.sh

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -267,11 +267,15 @@ TARBALL_STEP_ARGS+=("--resume" "${BUILD_TMPDIR}")
267267
timestamp=$(date +%s)
268268
# to set EESSI_VERSION we need to source init/eessi_defaults now
269269
source $software_layer_dir/init/eessi_defaults
270-
# Note: iff ${EESSI_DEV_PROJECT} is defined (building for dev.eessi.io), then we
270+
# Note: if ${EESSI_DEV_PROJECT} is defined (building for dev.eessi.io), then we
271271
# append the project (subdirectory) name to the end tarball name. This is information
272272
# then used at the ingestion stage. If ${EESSI_DEV_PROJECT} is not defined, nothing is
273273
# appended
274-
export TGZ=$(printf "eessi-%s-software-%s-%s-%b%d.tar.gz" ${EESSI_VERSION} ${EESSI_OS_TYPE} ${EESSI_SOFTWARE_SUBDIR_OVERRIDE//\//-} ${EESSI_DEV_PROJECT:+$EESSI_DEV_PROJECT-} ${timestamp})
274+
if [[ -z ${EESSI_ACCELERATOR_TARGET_OVERRIDE} ]]; then
275+
export TGZ=$(printf "eessi-%s-software-%s-%s-%b%d.tar.gz" ${EESSI_VERSION} ${EESSI_OS_TYPE} ${EESSI_SOFTWARE_SUBDIR_OVERRIDE//\//-} ${EESSI_DEV_PROJECT:+$EESSI_DEV_PROJECT-} ${timestamp})
276+
else
277+
export TGZ=$(printf "eessi-%s-software-%s-%s-%s-%b%d.tar.gz" ${EESSI_VERSION} ${EESSI_OS_TYPE} ${EESSI_SOFTWARE_SUBDIR_OVERRIDE//\//-} ${EESSI_ACCELERATOR_TARGET_OVERRIDE//\//-} ${EESSI_DEV_PROJECT:+$EESSI_DEV_PROJECT-} ${timestamp})
278+
fi
275279

276280
# Export EESSI_DEV_PROJECT to use it (if needed) when making tarball
277281
echo "bot/build.sh: EESSI_DEV_PROJECT='${EESSI_DEV_PROJECT}'"

create_lmodsitepackage.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,13 +123,32 @@
123123
local refer_to_docs = "For more information on how to do this, see https://www.eessi.io/docs/site_specific_config/gpu/.\\n"
124124
if packagesList[simpleName] then
125125
-- simpleName is a module in packagesList
126-
-- get the full host_injections path
126+
-- first, check the old host_injections path prior to https://github.com/EESSI/software-layer-scripts/pull/59
127+
-- If that exists, print a more targetted, explanatory warning
128+
local previousHostInjections = string.gsub(os.getenv('EESSI_SOFTWARE_PATH') or "", 'versions', 'host_injections')
129+
local previousPackageEasyBuildDir = previousHostInjections .. "/software/" .. t.modFullName .. "/easybuild"
130+
local previousPackageDirExists = isDir(previousPackageEasyBuildDir)
131+
132+
-- get the host_injections path, and add only the EESSI_CPU_FAMILY at the end
133+
local strip_suffix = os.getenv('EESSI_VERSION') .. "/software/" .. os.getenv('EESSI_OS_TYPE') .. "/"
134+
strip_suffix = strip_suffix .. os.getenv('EESSI_SOFTWARE_SUBDIR')
127135
local hostInjections = string.gsub(os.getenv('EESSI_SOFTWARE_PATH') or "", 'versions', 'host_injections')
136+
hostInjections = string.gsub(hostInjections, strip_suffix, os.getenv('EESSI_CPU_FAMILY'))
128137
129138
-- build final path where the software should be installed
130139
local packageEasyBuildDir = hostInjections .. "/software/" .. t.modFullName .. "/easybuild"
131140
local packageDirExists = isDir(packageEasyBuildDir)
132-
if not packageDirExists then
141+
if previousPackageDirExists and not packageDirExists then
142+
local advice = "but while the module file exists, the actual software is not entirely shipped with EESSI "
143+
advice = advice .. "due to licencing. You will need to install a full copy of the " .. simpleName .. " package where EESSI "
144+
advice = advice .. "can find it.\\n"
145+
advice = advice .. "Note that a full copy is installed at " .. previousHostInjections .. "/software/" .. t.modFullName .. ". "
146+
advice = advice .. "However, EESSI expects it in a different location since Aug'25, namely at "
147+
advice = advice .. hostInjections .. "/software/" .. t.modFullName .. ". "
148+
advice = advice .. "Please re-install the package at the new location. "
149+
advice = advice .. refer_to_docs
150+
LmodError("\\nYou requested to load ", simpleName, " ", advice)
151+
elseif not packageDirExists then
133152
local advice = "but while the module file exists, the actual software is not entirely shipped with EESSI "
134153
advice = advice .. "due to licencing. You will need to install a full copy of the " .. simpleName .. " package where EESSI "
135154
advice = advice .. "can find it.\\n"
@@ -293,7 +312,7 @@ def error(msg):
293312
# the install path (if it exists)
294313
accel_subdir = os.getenv("EESSI_ACCELERATOR_TARGET")
295314
if accel_subdir:
296-
sitepackage_path = sitepackage_path.replace("/accel/%s" % accel_subdir, '')
315+
sitepackage_path = sitepackage_path.replace("/%s" % accel_subdir, '')
297316
try:
298317
os.makedirs(os.path.dirname(sitepackage_path), exist_ok=True)
299318
with open(sitepackage_path, 'w') as fp:

easystacks/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
WARNING: in principle _all_ easystack files should go into EESSI/software-layer, not in EESSI/software-layer-scripts. Easystack files are only added in EESSI/software-layer-scripts by exception, for example when the (re)deployment of the software has to be done synchronously with a change in EESSI/software-layer-scripts.
2+
3+
Here, we list past deployments for which this was the case (and why):
4+
5+
[PR#59](https://github.com/EESSI/software-layer-scripts/pull/59): modified the prefix in which `install_cuda_and_libraries.sh` installs the CUDA toolkit within `host_injections`. Also, updated the Lmod SitePackage.lua to print an informative message in case the CUDA Toolkit is found in the old location. This requires synchronous deployment of new CUDA and cuDNN installations in the software layer, because the symlinks from these installations should be redirected to the new prefix in `host_injections`.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# In https://github.com/EESSI/software-layer-scripts/pull/59 we introduced a new location for
2+
# installing the CUDA toolkit within the host_injections directory. This requires reinstallation
3+
# of CUDA and cuDNN to make sure all symlinks point to these new locations
4+
easyconfigs:
5+
- CUDA-12.1.1.eb:
6+
options:
7+
accept-eula-for: CUDA
8+
- CUDA-12.4.0.eb:
9+
options:
10+
accept-eula-for: CUDA
11+
- cuDNN-8.9.2.26-CUDA-12.1.1.eb:
12+
options:
13+
accept-eula-for: cuDNN

0 commit comments

Comments
 (0)