Skip to content

Refactors DeepSSM for reliability, memory efficiency, and testability.#2486

Open
akenmorris wants to merge 45 commits intomasterfrom
deepssm_refactor2
Open

Refactors DeepSSM for reliability, memory efficiency, and testability.#2486
akenmorris wants to merge 45 commits intomasterfrom
deepssm_refactor2

Conversation

@akenmorris
Copy link
Contributor

  • Streaming data loaders - load images on-demand to reduce memory usage
  • Robustness - config validation, graceful handling of empty meshes, clear error messages on missing files
  • Testing - GTest harness with 2 configurations (~90 sec), result verification
  • Bug fixes - command exit code, PyTorch 2.6 compatibility, toMesh pipeline, bounding box calculation

* Add constants.py to centralize magic strings (file names, loader names, device strings) for improved maintainability
* Add set_seed() function in net_utils.py for reproducible training by seeding Python random, NumPy, PyTorch CPU/CUDA, and cuDNN
* Update loaders.py, trainer.py, model.py, eval.py to use constants
* Export constants and set_seed from __init__.py

Verified: test outputs are identical before and after refactoring.
* Add DataLoadingError exception with descriptive messages including
  file paths and line numbers for debugging

* Validate inputs in get_particles, get_images, get_all_train_data,
  get_validation_loader, and get_test_loader

* Add --exact_check flag with save/verify modes for platform-specific
  refactoring verification

* Return mean_distance from process_test_predictions for exact checking
  - Add --tl_net flag to enable TL-DeepSSM network testing
  - Fix PyTorch 2.6 compatibility: add weights_only=False to torch.load
    calls in trainer.py and model.py for DataLoader loading
  - Fix eval.py returning wrong file path for tl_net mode
  - Fix deep_ssm.py path handling for local predictions directory
  - Add Testing/DeepSSMTests/ with C++ test harness and shell scripts
  - Add deepssm_test_data.zip (6MB) containing femur meshes, images,
    constraints, and pre-configured project files
  - Fix bug in Commands.cpp where DeepSSM command returned false (exit
    code 1) on success instead of true (exit code 0)
  - Remove --tl_net argument from Python use case since testing different
    DeepSSM configurations is now done via project files
  Add verify_deepssm_results.py script that validates test output by
  checking mean surface-to-surface distance from test_distances.csv.
  Uses loose tolerance (0-300) for quick 1-epoch tests to catch
  catastrophic failures while keeping tests fast. Supports
  --exact_check save/verify for platform-specific refactoring
  verification with tighter tolerances.
  - Add README.md with instructions for running tests and exact check mode
  - Add run_exact_check.sh to verify all quick test configurations
  - Add run_extended_tests.sh to run tests on a directory of projects
  - Add --baseline_file option to verify script for per-project baselines
- Improve toMesh() pipeline in Image.cpp: add TriangleFilter to handle
  degenerate cells from vtkContourFilter, CleanPolyData to remove
  duplicates, and ConnectivityFilter to extract largest region
- Add empty mesh validation in Groom after toMesh()
- Add empty segmentation check before crop operation
- Check both source and reference mesh in ICP transforms
- Add validation in Mesh::extractLargestComponent() for empty/degenerate cells
When createICPTransform receives empty source or target meshes, return
an identity transform with a warning instead of throwing an exception.
This allows batch processing to continue gracefully when some shapes
fail to generate valid meshes.
Instead of loading all images into memory when creating DataLoaders,
use streaming datasets that load images on-demand during
training. This significantly reduces memory usage for large datasets.

Key changes:
  - DeepSSMdatasetStreaming class loads images lazily from disk
  - Training/validation/test loaders save metadata instead of full data
  - load_data_loader() reconstructs loaders from metadata
  - get_loader_info() extracts dimensions without loading full dataset
  - Backward compatible with legacy pre-loaded loaders
  - Use world particle positions for bounding box calculation instead of
    transformed groomed meshes. World particles reflect actual aligned
    positions including optimization transforms.
  - Add periodic garbage collection during training image grooming
  - Add try/except around validation/test image registration to continue
    processing even if individual subjects fail
  - Skip missing validation/test images gracefully with warnings
  - Skip test subjects without predictions during post-processing
Run only default and tl_net_fine_tune tests, which together cover
all code paths (standard DeepSSM, TL-DeepSSM, and fine tuning).
Cuts test time from ~3 minutes to ~90 seconds.
auto (-1) defaults to a subset of 30 to avoid O(n^2) pairwise ICP on large datasets
…e-subset

Resolve #2487 - Auto subset size in grooming should pick a smart auto
* Add constants.py to centralize magic strings (file names, loader names, device strings) for improved maintainability
* Add set_seed() function in net_utils.py for reproducible training by seeding Python random, NumPy, PyTorch CPU/CUDA, and cuDNN
* Update loaders.py, trainer.py, model.py, eval.py to use constants
* Export constants and set_seed from __init__.py

Verified: test outputs are identical before and after refactoring.
* Add DataLoadingError exception with descriptive messages including
  file paths and line numbers for debugging

* Validate inputs in get_particles, get_images, get_all_train_data,
  get_validation_loader, and get_test_loader

* Add --exact_check flag with save/verify modes for platform-specific
  refactoring verification

* Return mean_distance from process_test_predictions for exact checking
  - Add --tl_net flag to enable TL-DeepSSM network testing
  - Fix PyTorch 2.6 compatibility: add weights_only=False to torch.load
    calls in trainer.py and model.py for DataLoader loading
  - Fix eval.py returning wrong file path for tl_net mode
  - Fix deep_ssm.py path handling for local predictions directory
  - Add Testing/DeepSSMTests/ with C++ test harness and shell scripts
  - Add deepssm_test_data.zip (6MB) containing femur meshes, images,
    constraints, and pre-configured project files
  - Fix bug in Commands.cpp where DeepSSM command returned false (exit
    code 1) on success instead of true (exit code 0)
  - Remove --tl_net argument from Python use case since testing different
    DeepSSM configurations is now done via project files
  Add verify_deepssm_results.py script that validates test output by
  checking mean surface-to-surface distance from test_distances.csv.
  Uses loose tolerance (0-300) for quick 1-epoch tests to catch
  catastrophic failures while keeping tests fast. Supports
  --exact_check save/verify for platform-specific refactoring
  verification with tighter tolerances.
  - Add README.md with instructions for running tests and exact check mode
  - Add run_exact_check.sh to verify all quick test configurations
  - Add run_extended_tests.sh to run tests on a directory of projects
  - Add --baseline_file option to verify script for per-project baselines
- Improve toMesh() pipeline in Image.cpp: add TriangleFilter to handle
  degenerate cells from vtkContourFilter, CleanPolyData to remove
  duplicates, and ConnectivityFilter to extract largest region
- Add empty mesh validation in Groom after toMesh()
- Add empty segmentation check before crop operation
- Check both source and reference mesh in ICP transforms
- Add validation in Mesh::extractLargestComponent() for empty/degenerate cells
When createICPTransform receives empty source or target meshes, return
an identity transform with a warning instead of throwing an exception.
This allows batch processing to continue gracefully when some shapes
fail to generate valid meshes.
Instead of loading all images into memory when creating DataLoaders,
use streaming datasets that load images on-demand during
training. This significantly reduces memory usage for large datasets.

Key changes:
  - DeepSSMdatasetStreaming class loads images lazily from disk
  - Training/validation/test loaders save metadata instead of full data
  - load_data_loader() reconstructs loaders from metadata
  - get_loader_info() extracts dimensions without loading full dataset
  - Backward compatible with legacy pre-loaded loaders
  - Use world particle positions for bounding box calculation instead of
    transformed groomed meshes. World particles reflect actual aligned
    positions including optimization transforms.
  - Add periodic garbage collection during training image grooming
  - Add try/except around validation/test image registration to continue
    processing even if individual subjects fail
  - Skip missing validation/test images gracefully with warnings
  - Skip test subjects without predictions during post-processing
Run only default and tl_net_fine_tune tests, which together cover
all code paths (standard DeepSSM, TL-DeepSSM, and fine tuning).
Cuts test time from ~3 minutes to ~90 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant