Refactors DeepSSM for reliability, memory efficiency, and testability. by akenmorris · Pull Request #2486 · SCIInstitute/ShapeWorks

akenmorris · 2026-02-05T19:18:30Z

Streaming data loaders - load images on-demand to reduce memory usage
Robustness - config validation, graceful handling of empty meshes, clear error messages on missing files
Testing - GTest harness with 2 configurations (~90 sec), result verification
Bug fixes - command exit code, PyTorch 2.6 compatibility, toMesh pipeline, bounding box calculation

* Add constants.py to centralize magic strings (file names, loader names, device strings) for improved maintainability * Add set_seed() function in net_utils.py for reproducible training by seeding Python random, NumPy, PyTorch CPU/CUDA, and cuDNN * Update loaders.py, trainer.py, model.py, eval.py to use constants * Export constants and set_seed from __init__.py Verified: test outputs are identical before and after refactoring.

* Add DataLoadingError exception with descriptive messages including file paths and line numbers for debugging * Validate inputs in get_particles, get_images, get_all_train_data, get_validation_loader, and get_test_loader * Add --exact_check flag with save/verify modes for platform-specific refactoring verification * Return mean_distance from process_test_predictions for exact checking

- Add --tl_net flag to enable TL-DeepSSM network testing - Fix PyTorch 2.6 compatibility: add weights_only=False to torch.load calls in trainer.py and model.py for DataLoader loading - Fix eval.py returning wrong file path for tl_net mode - Fix deep_ssm.py path handling for local predictions directory

- Add Testing/DeepSSMTests/ with C++ test harness and shell scripts - Add deepssm_test_data.zip (6MB) containing femur meshes, images, constraints, and pre-configured project files - Fix bug in Commands.cpp where DeepSSM command returned false (exit code 1) on success instead of true (exit code 0) - Remove --tl_net argument from Python use case since testing different DeepSSM configurations is now done via project files

Add verify_deepssm_results.py script that validates test output by checking mean surface-to-surface distance from test_distances.csv. Uses loose tolerance (0-300) for quick 1-epoch tests to catch catastrophic failures while keeping tests fast. Supports --exact_check save/verify for platform-specific refactoring verification with tighter tolerances.

- Add README.md with instructions for running tests and exact check mode - Add run_exact_check.sh to verify all quick test configurations - Add run_extended_tests.sh to run tests on a directory of projects - Add --baseline_file option to verify script for per-project baselines

- Improve toMesh() pipeline in Image.cpp: add TriangleFilter to handle degenerate cells from vtkContourFilter, CleanPolyData to remove duplicates, and ConnectivityFilter to extract largest region - Add empty mesh validation in Groom after toMesh() - Add empty segmentation check before crop operation - Check both source and reference mesh in ICP transforms - Add validation in Mesh::extractLargestComponent() for empty/degenerate cells

When createICPTransform receives empty source or target meshes, return an identity transform with a warning instead of throwing an exception. This allows batch processing to continue gracefully when some shapes fail to generate valid meshes.

Instead of loading all images into memory when creating DataLoaders, use streaming datasets that load images on-demand during training. This significantly reduces memory usage for large datasets. Key changes: - DeepSSMdatasetStreaming class loads images lazily from disk - Training/validation/test loaders save metadata instead of full data - load_data_loader() reconstructs loaders from metadata - get_loader_info() extracts dimensions without loading full dataset - Backward compatible with legacy pre-loaded loaders

- Use world particle positions for bounding box calculation instead of transformed groomed meshes. World particles reflect actual aligned positions including optimization transforms. - Add periodic garbage collection during training image grooming - Add try/except around validation/test image registration to continue processing even if individual subjects fail - Skip missing validation/test images gracefully with warnings - Skip test subjects without predictions during post-processing

Run only default and tl_net_fine_tune tests, which together cover all code paths (standard DeepSSM, TL-DeepSSM, and fine tuning). Cuts test time from ~3 minutes to ~90 seconds.

auto (-1) defaults to a subset of 30 to avoid O(n^2) pairwise ICP on large datasets

…e-subset Resolve #2487 - Auto subset size in grooming should pick a smart auto