-
Notifications
You must be signed in to change notification settings - Fork 28
Perf: accelerate feature_to_block with torch_scatter #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: accelerate feature_to_block with torch_scatter #302
Conversation
📝 WalkthroughWalkthroughThe pull request refactors the Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dptb/data/interfaces/ham_to_feature.py (1)
420-437: Critical bug in self-loop block storage: wrong index logic inatom_i == atom_jcase.Lines 426-431 have inverted logic. When processing a self-loop edge (i, i, R_shift), the code checks if the reverse key (i, i, -R_shift) exists in blocks, then conditionally stores or accumulates—but always at the wrong index:
- If r_index doesn't exist: stores block at
blocks[block_index]✓- If r_index exists: stores at
blocks[r_index]instead ofblocks[block_index]✗This means the block for (i, i, R_shift) is never properly stored; it either overwrites or gets incorrectly accumulated into the opposite R_shift entry. For Hermitian systems with periodic boundaries, self-loop blocks with different shifts corrupt each other.
Fix: Treat self-loops like other edges—always store/accumulate at the current edge's block_index:
elif atom_i == atom_j: if blocks.get(block_index, None) is None: blocks[block_index] = block else: blocks[block_index] += block
🧹 Nitpick comments (4)
dptb/data/transforms.py (2)
848-905: Consider specifying device for precomputed index tensors.The precomputed index tensors are created on CPU by default. When used in
feature_to_block, they're moved to the target device on every call (lines 373-377 in ham_to_feature.py). For better performance, consider either:
- Creating these tensors on
self.deviceduring precomputation, or- Caching the device-specific versions after first use
Option 1: Create on self.device during precomputation
self._node_feature_to_block_indices[symbol] = { - 'src': torch.tensor(src_indices, dtype=torch.long), - 'dst': torch.tensor(dst_indices, dtype=torch.long), - 'dst_T': torch.tensor(dst_indices_T, dtype=torch.long), - 'is_diag': torch.tensor(is_diag, dtype=torch.bool), + 'src': torch.tensor(src_indices, dtype=torch.long, device=self.device), + 'dst': torch.tensor(dst_indices, dtype=torch.long, device=self.device), + 'dst_T': torch.tensor(dst_indices_T, dtype=torch.long, device=self.device), + 'is_diag': torch.tensor(is_diag, dtype=torch.bool, device=self.device), 'norb': norb }
907-969: Consider specifying device for precomputed index tensors.Similar to the node indices, the edge index tensors are created on CPU by default and moved to device on every call. Consider creating them on
self.deviceduring precomputation for better performance.Proposed fix
self._edge_feature_to_block_indices[bond_type] = { - 'src': torch.tensor(src_indices, dtype=torch.long), - 'dst': torch.tensor(dst_indices, dtype=torch.long), - 'scale': torch.tensor(scale_factors, dtype=torch.float32), + 'src': torch.tensor(src_indices, dtype=torch.long, device=self.device), + 'dst': torch.tensor(dst_indices, dtype=torch.long, device=self.device), + 'scale': torch.tensor(scale_factors, dtype=torch.float32, device=self.device), 'norb_i': norb_i, 'norb_j': norb_j }dptb/data/interfaces/ham_to_feature.py (2)
362-365: Optimize atom symbol lookup to avoid repeated single-element untransform calls.The current implementation calls
idp.untransform()for each atom individually within a list comprehension. Sinceuntransformsupports batch operations, you can compute all symbols in one call and then convert to chemical symbols.Proposed optimization
# Pre-compute atom symbols for all atoms (vectorized lookup) atom_types = data[_keys.ATOM_TYPE_KEY] + atomic_numbers = idp.untransform(atom_types) atom_symbols = [ - ase.data.chemical_symbols[idp.untransform(atom_types[i].reshape(-1))] + ase.data.chemical_symbols[int(atomic_numbers[i])] for i in range(len(atom_types)) ]
373-377: Index device movement happens on every call.The precomputed indices are moved to the target device on every
feature_to_blockcall for each symbol/bond type. While the comment at line 372 says "only on first use per symbol," the implementation doesn't cache device-specific versions. This could impact performance when processing multiple batches.Consider implementing device-specific caching or creating indices on the target device during precomputation (as suggested in the transforms.py review).
Also applies to: 408-412
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dptb/data/interfaces/ham_to_feature.pydptb/data/transforms.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (5)
dptb/data/transforms.py (2)
845-846: LGTM!The blank line improves readability by visually separating method definitions.
947-959: Clarify the scale factor logic and comment inconsistency.The comment on line 929 states "0.5 for diagonal pairs," but line 947 checks
is_same_basis(whether the basis pair is identical), not whether an element is on the matrix diagonal. These are not equivalent. Additionally, the edge version uses scale factors to handle symmetry, whereas the node version uses explicit transposed indices—the rationale for this design choice and the specific 0.5 factor for same basis pairs should be documented.dptb/data/interfaces/ham_to_feature.py (3)
9-9: LGTM!Correctly removed unused
anglrMIdimport after refactoring to scatter-based operations. The angular momentum handling is now encapsulated in the precomputed index methods.
323-335: LGTM!Clear and comprehensive docstring that explains the vectorized approach and performance benefits.
391-391: No issues found. The block_index format is consistent with the auto-detection logic inblock_to_feature. Line 391 writes 0-indexed blocks ([atom, atom, 0, 0, 0]), and thestart_idlogic at lines 50–58 correctly detects this format and retrieves blocks with matching indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the feature_to_block function by introducing pre-computed scatter indices for vectorized operations. The optimization replaces nested Python loops with efficient PyTorch scatter operations, improving performance when converting feature vectors to Hamiltonian/overlap block matrices.
Key changes:
- Added two new caching methods (
get_node_feature_to_block_indicesandget_edge_feature_to_block_indices) to pre-compute scatter index mappings - Refactored
feature_to_blockto use vectorized scatter operations instead of nested loops - Removed unused
anglrMIdimport fromham_to_feature.py
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| dptb/data/transforms.py | Added get_node_feature_to_block_indices and get_edge_feature_to_block_indices methods to pre-compute and cache scatter indices for vectorized block matrix construction |
| dptb/data/interfaces/ham_to_feature.py | Refactored feature_to_block function to use pre-computed scatter indices with vectorized operations; removed unused import; added comprehensive docstring |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dptb/data/interfaces/ham_to_feature.py`:
- Around line 360-365: The loop building atom_symbols is indexing
ase.data.chemical_symbols with a tensor returned from idp.untransform(), causing
a TypeError; fix it by converting the tensor to a Python integer scalar before
using it as an index (e.g., call .item() or int(...) on the result of
idp.untransform(atom_types[i].reshape(-1))) so ase.data.chemical_symbols[...]
yields a Python string, and ensure the list atom_symbols contains those hashable
strings (so set(atom_symbols) will work).
| # Pre-compute atom symbols for all atoms (vectorized lookup) | ||
| atom_types = data[_keys.ATOM_TYPE_KEY] | ||
| atom_symbols = [ | ||
| ase.data.chemical_symbols[idp.untransform(atom_types[i].reshape(-1))] | ||
| for i in range(len(atom_types)) | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the untransform function definition
rg -n "def untransform" -A 5 dptb/data/Repository: deepmodeling/DeePTB
Length of output: 1445
🏁 Script executed:
# Also check imports in ham_to_feature.py to understand idp
rg -n "import.*idp\|from.*idp" dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 45
🏁 Script executed:
# Check the actual context around lines 360-365
sed -n '350,375p' dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 1072
🏁 Script executed:
# Search for ATOM_TYPE_KEY definition and its type
rg -n "ATOM_TYPE_KEY" dptb/data/Repository: deepmodeling/DeePTB
Length of output: 2262
🏁 Script executed:
# Search for other uses of untransform and how the result is used
rg -n "untransform\(" -B2 -A2 dptb/data/interfaces/ham_to_feature.py | head -50Repository: deepmodeling/DeePTB
Length of output: 1202
🏁 Script executed:
# Check if .item() is used elsewhere in the file
rg -n "\.item\(\)" dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 45
🏁 Script executed:
# Look for ase.data.chemical_symbols usage patterns
rg -n "ase\.data\.chemical_symbols" dptb/Repository: deepmodeling/DeePTB
Length of output: 1306
🏁 Script executed:
# Check what type atom_types actually is and how it flows
sed -n '355,370p' dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 634
🏁 Script executed:
# Check how atomic_numbers is typically used when obtained from untransform
sed -n '75,85p' dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 565
🏁 Script executed:
# Look at the broader context - is ATOMIC_NUMBERS_KEY available at line 360?
sed -n '320,370p' dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 2079
🏁 Script executed:
# Check if there are any tests for this code path
find dptb -name "*test*.py" -type f | xargs rg -l "ham_to_feature\|feature_to_block" 2>/dev/nullRepository: deepmodeling/DeePTB
Length of output: 47
🏁 Script executed:
# Verify the exact type of untransform return value and how .reshape(-1) behaves
rg -n "def untransform" -A 10 dptb/data/transforms.py | head -30Repository: deepmodeling/DeePTB
Length of output: 1326
🏁 Script executed:
# Check if the code at line 363 is actually executed - look for test patterns
rg -n "feature_to_block" dptb/ -A 3 | grep -E "test|def|feature_to_block"Repository: deepmodeling/DeePTB
Length of output: 4322
🏁 Script executed:
# Try to understand what atom_types[i].reshape(-1) produces and if it can be indexed
python3 << 'EOF'
import torch
# Simulate what happens
atom_types = torch.tensor([[1], [2], [3]]) # Example shape
result = atom_types[0].reshape(-1) # This would be a 1D tensor of shape [1]
print(f"Result shape: {result.shape}")
print(f"Result type: {type(result)}")
print(f"Result dtype: {result.dtype}")
# Try to use this as an index into a tuple
try:
test_tuple = ("H", "C", "N", "O")
index = result
val = test_tuple[index]
print(f"Indexing worked: {val}")
except Exception as e:
print(f"Error when indexing tuple with tensor: {type(e).__name__}: {e}")
# Try with .item()
try:
val = test_tuple[result.item()]
print(f"Indexing with .item() worked: {val}")
except Exception as e:
print(f"Error with .item(): {e}")
EOFRepository: deepmodeling/DeePTB
Length of output: 149
🏁 Script executed:
# Check how atom_types are structured when passed to feature_to_block
rg -n "feature_to_block\(" dptb/ -B 5 | head -40Repository: deepmodeling/DeePTB
Length of output: 2692
🏁 Script executed:
# Check the test more carefully to see if it actually hits the atom_symbols code
sed -n '80,100p' dptb/tests/test_block_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 1189
🏁 Script executed:
# Let me verify what happens when you try to index with a tensor
python3 << 'EOF'
import ase.data
# Simulate the exact operation from the code
test_index = __import__('torch').tensor([6]) # Carbon has atomic number 6
try:
result = ase.data.chemical_symbols[test_index]
print(f"Success: {result}")
except TypeError as e:
print(f"TypeError: {e}")
except Exception as e:
print(f"{type(e).__name__}: {e}")
# Try with .item()
try:
result = ase.data.chemical_symbols[test_index.item()]
print(f"With .item() success: {result}")
except Exception as e:
print(f"With .item() error: {e}")
EOFRepository: deepmodeling/DeePTB
Length of output: 178
🏁 Script executed:
# Check if there were recent changes to this function
git log --oneline -n 20 -- dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 116
🏁 Script executed:
# Check the recent commit to understand what changed
git show d97a847 --statRepository: deepmodeling/DeePTB
Length of output: 50375
🏁 Script executed:
# Get more details about the specific change
git show d97a847 | head -100Repository: deepmodeling/DeePTB
Length of output: 3455
🏁 Script executed:
# Look at the actual error - try to understand the code path better
rg -n "atom_symbols\[" dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 202
🏁 Script executed:
# Check if set(atom_symbols) is actually called and what could fail
sed -n '360,380p' dptb/data/interfaces/ham_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 947
🏁 Script executed:
# Check if the test actually covers atom_symbols code path - look at test data
sed -n '85,105p' dptb/tests/test_block_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 1395
🏁 Script executed:
# Verify if ATOMIC_NUMBERS_KEY is set before feature_to_block is called
sed -n '30,50p' dptb/tests/test_block_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 541
🏁 Script executed:
# Check the actual structure to understand data flow before feature_to_block
sed -n '1,50p' dptb/tests/test_block_to_feature.pyRepository: deepmodeling/DeePTB
Length of output: 1480
🏁 Script executed:
# Try to simulate what happens when indexing with a tensor
python3 << 'EOF'
# Verify that tensor indexing on tuple fails
try:
import torch
test_tuple = ("H", "C", "N", "O")
tensor_idx = torch.tensor([2])
result = test_tuple[tensor_idx]
print(f"Success (unexpected): {result}")
except TypeError as e:
print(f"TypeError (expected): {e}")
except Exception as e:
print(f"Other error: {type(e).__name__}: {e}")
# Test with .item()
try:
result = test_tuple[tensor_idx.item()]
print(f"With .item() success: {result}")
except Exception as e:
print(f"With .item() error: {e}")
EOFRepository: deepmodeling/DeePTB
Length of output: 173
🏁 Script executed:
# Check if ATOMIC_NUMBERS_KEY is being used anywhere in feature_to_block or its callsites
sed -n '322,450p' dptb/data/interfaces/ham_to_feature.py | grep -n "ATOMIC_NUMBERS\|atom_symbols"Repository: deepmodeling/DeePTB
Length of output: 275
Fix tensor indexing when deriving atom symbols from idp.untransform().
idp.untransform() returns a tensor, which cannot index the ase.data.chemical_symbols tuple. This will cause a TypeError at runtime. Convert the tensor to an integer scalar before indexing, and ensure set(atom_symbols) receives hashable values.
♻️ Suggested fix
atom_types = data[_keys.ATOM_TYPE_KEY]
atom_symbols = [
- ase.data.chemical_symbols[idp.untransform(atom_types[i].reshape(-1))]
+ ase.data.chemical_symbols[int(idp.untransform(atom_types[i]).item())]
for i in range(len(atom_types))
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Pre-compute atom symbols for all atoms (vectorized lookup) | |
| atom_types = data[_keys.ATOM_TYPE_KEY] | |
| atom_symbols = [ | |
| ase.data.chemical_symbols[idp.untransform(atom_types[i].reshape(-1))] | |
| for i in range(len(atom_types)) | |
| ] | |
| # Pre-compute atom symbols for all atoms (vectorized lookup) | |
| atom_types = data[_keys.ATOM_TYPE_KEY] | |
| atom_symbols = [ | |
| ase.data.chemical_symbols[int(idp.untransform(atom_types[i]).item())] | |
| for i in range(len(atom_types)) | |
| ] |
🤖 Prompt for AI Agents
In `@dptb/data/interfaces/ham_to_feature.py` around lines 360 - 365, The loop
building atom_symbols is indexing ase.data.chemical_symbols with a tensor
returned from idp.untransform(), causing a TypeError; fix it by converting the
tensor to a Python integer scalar before using it as an index (e.g., call
.item() or int(...) on the result of idp.untransform(atom_types[i].reshape(-1)))
so ase.data.chemical_symbols[...] yields a Python string, and ensure the list
atom_symbols contains those hashable strings (so set(atom_symbols) will work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| operations and added device-side caching to avoid redundant transfers. | ||
| """ | ||
|
|
||
| import pytest |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
| import pytest |
| from dptb.nn.nnsk import NNSK | ||
| from dptb.data.transforms import OrbitalMapper | ||
| from dptb.data.build import build_dataset | ||
| from dptb.data import AtomicDataset, DataLoader, AtomicDataDict, AtomicData |
Copilot
AI
Feb 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'AtomicDataset' is not used.
Import of 'AtomicDataDict' is not used.
| from dptb.data import AtomicDataset, DataLoader, AtomicDataDict, AtomicData | |
| from dptb.data import DataLoader, AtomicData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@dptb/tests/test_feature_to_block_consistency.py`:
- Around line 230-270: The test_performance_improvement benchmark is flaky:
switch timing from time.time() to time.perf_counter() for higher-resolution
clocks when measuring optimized_time and reference_time; gate the test with a
pytest skip/marker that checks an env var (e.g., skip unless
os.environ.get("RUN_PERF_TESTS") == "1") so it only runs in perf-aware CI or
locally; and change the unnecessary f-string print(f"\nPerformance comparison:")
to a normal print("\nPerformance comparison:"). Update references to the
measured functions feature_to_block and feature_to_block_reference and the
test_performance_improvement function accordingly.
- Around line 49-64: The non-overlap branch currently only sets blocks when
NODE_FEATURES_KEY exists and silently continues otherwise, risking an
UnboundLocalError; make it symmetric with the overlap branch by explicitly
checking for both _keys.NODE_FEATURES_KEY and _keys.EDGE_FEATURES_KEY at the
start of the overlap=False path, and if either is missing raise a KeyError
(similar to the overlap=True path that checks
_keys.NODE_OVERLAP_KEY/_keys.EDGE_OVERLAP_KEY); when both exist, assign
node_features, edge_features, set has_block = True and initialize blocks = {}.
🧹 Nitpick comments (1)
dptb/tests/test_feature_to_block_consistency.py (1)
147-194: Defer dataset construction to test setup to avoid import-time work and shared state.
Building datasets/loaders at class definition runs during collection (even when tests are deselected) and keeps mutable config dicts shared across tests. Moving this intosetup_classor a fixture reduces collection cost and avoids accidental mutation leaks; considerClassVarannotations if you keep class-level dicts.♻️ Suggested refactor using setup_class
- # Build test dataset - train_datasets = build_dataset(**data_options, **data_options["train"], **common_options) - train_loader = DataLoader(dataset=train_datasets, batch_size=1, shuffle=False) - batch = next(iter(train_loader)) - batch = AtomicData.to_AtomicDataDict(batch) - - # Create OrbitalMapper instances - idp = OrbitalMapper(basis=common_options['basis'], method="sktb") + `@classmethod` + def setup_class(cls): + cls.train_datasets = build_dataset(**cls.data_options, **cls.data_options["train"], **cls.common_options) + cls.train_loader = DataLoader(dataset=cls.train_datasets, batch_size=1, shuffle=False) + cls.batch = AtomicData.to_AtomicDataDict(next(iter(cls.train_loader))) + cls.idp = OrbitalMapper(basis=cls.common_options["basis"], method="sktb")
| has_block = False | ||
| if not overlap: | ||
| if data.get(_keys.NODE_FEATURES_KEY, None) is not None: | ||
| node_features = data[_keys.NODE_FEATURES_KEY] | ||
| edge_features = data[_keys.EDGE_FEATURES_KEY] | ||
| has_block = True | ||
| blocks = {} | ||
| else: | ||
| if data.get(_keys.NODE_OVERLAP_KEY, None) is not None: | ||
| node_features = data[_keys.NODE_OVERLAP_KEY] | ||
| edge_features = data[_keys.EDGE_OVERLAP_KEY] | ||
| has_block = True | ||
| blocks = {} | ||
| else: | ||
| raise KeyError("Overlap features not found in data.") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n dptb/tests/test_feature_to_block_consistency.py | head -100Repository: deepmodeling/DeePTB
Length of output: 5121
🏁 Script executed:
wc -l dptb/tests/test_feature_to_block_consistency.pyRepository: deepmodeling/DeePTB
Length of output: 114
🏁 Script executed:
cat -n dptb/tests/test_feature_to_block_consistency.py | sed -n '49,150p'Repository: deepmodeling/DeePTB
Length of output: 5712
🏁 Script executed:
cat -n dptb/tests/test_feature_to_block_consistency.py | sed -n '140,200p'Repository: deepmodeling/DeePTB
Length of output: 2142
🏁 Script executed:
cat -n dptb/tests/test_feature_to_block_consistency.py | sed -n '195,250p'Repository: deepmodeling/DeePTB
Length of output: 2855
🏁 Script executed:
cat -n dptb/tests/test_feature_to_block_consistency.py | sed -n '250,298p'Repository: deepmodeling/DeePTB
Length of output: 2802
Make error handling symmetric for non-overlap and overlap feature paths.
The overlap=False path only initializes blocks when NODE_FEATURES_KEY exists, but silent skipping causes UnboundLocalError at the return statement if features are missing. The overlap=True path explicitly raises KeyError for missing features (line 63). Restructure the non-overlap path to check both features upfront and raise KeyError explicitly, matching the overlap path behavior and providing clearer failure modes.
Proposed fix
has_block = False
if not overlap:
- if data.get(_keys.NODE_FEATURES_KEY, None) is not None:
- node_features = data[_keys.NODE_FEATURES_KEY]
- edge_features = data[_keys.EDGE_FEATURES_KEY]
- has_block = True
- blocks = {}
+ node_features = data.get(_keys.NODE_FEATURES_KEY)
+ edge_features = data.get(_keys.EDGE_FEATURES_KEY)
+ if node_features is None or edge_features is None:
+ missing = _keys.NODE_FEATURES_KEY if node_features is None else _keys.EDGE_FEATURES_KEY
+ raise KeyError(missing)
+ has_block = True
+ blocks = {}🧰 Tools
🪛 Ruff (0.14.14)
[warning] 63-63: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@dptb/tests/test_feature_to_block_consistency.py` around lines 49 - 64, The
non-overlap branch currently only sets blocks when NODE_FEATURES_KEY exists and
silently continues otherwise, risking an UnboundLocalError; make it symmetric
with the overlap branch by explicitly checking for both _keys.NODE_FEATURES_KEY
and _keys.EDGE_FEATURES_KEY at the start of the overlap=False path, and if
either is missing raise a KeyError (similar to the overlap=True path that checks
_keys.NODE_OVERLAP_KEY/_keys.EDGE_OVERLAP_KEY); when both exist, assign
node_features, edge_features, set has_block = True and initialize blocks = {}.
| def test_performance_improvement(self): | ||
| """ | ||
| Benchmark test to verify that optimized implementation is faster | ||
| than the reference implementation. | ||
| """ | ||
| # Create NNSK model and generate features | ||
| hamiltonian = SKHamiltonian(idp_sk=self.idp, onsite=True) | ||
| nnsk = NNSK(**self.common_options, **self.model_options["nnsk"], transform=False) | ||
|
|
||
| with torch.no_grad(): | ||
| data = nnsk(self.batch.copy()) | ||
| data = hamiltonian(data) | ||
|
|
||
| # Warm-up runs | ||
| for _ in range(3): | ||
| _ = feature_to_block(data, nnsk.idp, overlap=False) | ||
| _ = feature_to_block_reference(data, nnsk.idp, overlap=False) | ||
|
|
||
| # Benchmark optimized implementation | ||
| num_runs = 10 | ||
| start_time = time.time() | ||
| for _ in range(num_runs): | ||
| _ = feature_to_block(data, nnsk.idp, overlap=False) | ||
| optimized_time = (time.time() - start_time) / num_runs | ||
|
|
||
| # Benchmark reference implementation | ||
| start_time = time.time() | ||
| for _ in range(num_runs): | ||
| _ = feature_to_block_reference(data, nnsk.idp, overlap=False) | ||
| reference_time = (time.time() - start_time) / num_runs | ||
|
|
||
| speedup = reference_time / optimized_time | ||
| print(f"\nPerformance comparison:") | ||
| print(f" Optimized: {optimized_time*1000:.3f} ms") | ||
| print(f" Reference: {reference_time*1000:.3f} ms") | ||
| print(f" Speedup: {speedup:.2f}x") | ||
|
|
||
| # Assert that optimized version is faster (or at least not significantly slower) | ||
| # Allow for some variance in timing measurements | ||
| assert optimized_time <= reference_time * 1.1, \ | ||
| f"Optimized implementation is slower: {optimized_time:.4f}s vs {reference_time:.4f}s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and get its line count
fd -t f "test_feature_to_block_consistency.py" dptb/tests/
wc -l dptb/tests/test_feature_to_block_consistency.py 2>/dev/null || echo "File not found at expected path"Repository: deepmodeling/DeePTB
Length of output: 162
🏁 Script executed:
# Read the specific lines mentioned in the review (230-270)
sed -n '230,270p' dptb/tests/test_feature_to_block_consistency.pyRepository: deepmodeling/DeePTB
Length of output: 1885
🏁 Script executed:
# Check imports at the top of the file to see what's available
head -30 dptb/tests/test_feature_to_block_consistency.pyRepository: deepmodeling/DeePTB
Length of output: 1013
🏁 Script executed:
# Check if pytest and os are imported
rg "^import (time|pytest|os)" dptb/tests/test_feature_to_block_consistency.pyRepository: deepmodeling/DeePTB
Length of output: 98
Performance test assertion is likely to be flaky in CI.
Wall-clock timing with a tight 1.1× bound is noisy under variable system load. Use time.perf_counter() for benchmarking instead of time.time(), gate the test behind a pytest marker (e.g., skip unless RUN_PERF_TESTS=1), and remove the unused f-string prefix on the static print statement.
Stabilization changes
+ `@pytest.mark.skipif`(os.getenv("RUN_PERF_TESTS") != "1", reason="Performance checks are noisy; run in perf CI job.")
def test_performance_improvement(self):
"""
Benchmark test to verify that optimized implementation is faster
than the reference implementation.
"""
# Create NNSK model and generate features
hamiltonian = SKHamiltonian(idp_sk=self.idp, onsite=True)
nnsk = NNSK(**self.common_options, **self.model_options["nnsk"], transform=False)
with torch.no_grad():
data = nnsk(self.batch.copy())
data = hamiltonian(data)
# Warm-up runs
for _ in range(3):
_ = feature_to_block(data, nnsk.idp, overlap=False)
_ = feature_to_block_reference(data, nnsk.idp, overlap=False)
# Benchmark optimized implementation
num_runs = 10
- start_time = time.time()
+ start_time = time.perf_counter()
for _ in range(num_runs):
_ = feature_to_block(data, nnsk.idp, overlap=False)
- optimized_time = (time.time() - start_time) / num_runs
+ optimized_time = (time.perf_counter() - start_time) / num_runs
# Benchmark reference implementation
- start_time = time.time()
+ start_time = time.perf_counter()
for _ in range(num_runs):
_ = feature_to_block_reference(data, nnsk.idp, overlap=False)
- reference_time = (time.time() - start_time) / num_runs
+ reference_time = (time.perf_counter() - start_time) / num_runs
speedup = reference_time / optimized_time
- print(f"\nPerformance comparison:")
+ print("\nPerformance comparison:")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_performance_improvement(self): | |
| """ | |
| Benchmark test to verify that optimized implementation is faster | |
| than the reference implementation. | |
| """ | |
| # Create NNSK model and generate features | |
| hamiltonian = SKHamiltonian(idp_sk=self.idp, onsite=True) | |
| nnsk = NNSK(**self.common_options, **self.model_options["nnsk"], transform=False) | |
| with torch.no_grad(): | |
| data = nnsk(self.batch.copy()) | |
| data = hamiltonian(data) | |
| # Warm-up runs | |
| for _ in range(3): | |
| _ = feature_to_block(data, nnsk.idp, overlap=False) | |
| _ = feature_to_block_reference(data, nnsk.idp, overlap=False) | |
| # Benchmark optimized implementation | |
| num_runs = 10 | |
| start_time = time.time() | |
| for _ in range(num_runs): | |
| _ = feature_to_block(data, nnsk.idp, overlap=False) | |
| optimized_time = (time.time() - start_time) / num_runs | |
| # Benchmark reference implementation | |
| start_time = time.time() | |
| for _ in range(num_runs): | |
| _ = feature_to_block_reference(data, nnsk.idp, overlap=False) | |
| reference_time = (time.time() - start_time) / num_runs | |
| speedup = reference_time / optimized_time | |
| print(f"\nPerformance comparison:") | |
| print(f" Optimized: {optimized_time*1000:.3f} ms") | |
| print(f" Reference: {reference_time*1000:.3f} ms") | |
| print(f" Speedup: {speedup:.2f}x") | |
| # Assert that optimized version is faster (or at least not significantly slower) | |
| # Allow for some variance in timing measurements | |
| assert optimized_time <= reference_time * 1.1, \ | |
| f"Optimized implementation is slower: {optimized_time:.4f}s vs {reference_time:.4f}s" | |
| `@pytest.mark.skipif`(os.getenv("RUN_PERF_TESTS") != "1", reason="Performance checks are noisy; run in perf CI job.") | |
| def test_performance_improvement(self): | |
| """ | |
| Benchmark test to verify that optimized implementation is faster | |
| than the reference implementation. | |
| """ | |
| # Create NNSK model and generate features | |
| hamiltonian = SKHamiltonian(idp_sk=self.idp, onsite=True) | |
| nnsk = NNSK(**self.common_options, **self.model_options["nnsk"], transform=False) | |
| with torch.no_grad(): | |
| data = nnsk(self.batch.copy()) | |
| data = hamiltonian(data) | |
| # Warm-up runs | |
| for _ in range(3): | |
| _ = feature_to_block(data, nnsk.idp, overlap=False) | |
| _ = feature_to_block_reference(data, nnsk.idp, overlap=False) | |
| # Benchmark optimized implementation | |
| num_runs = 10 | |
| start_time = time.perf_counter() | |
| for _ in range(num_runs): | |
| _ = feature_to_block(data, nnsk.idp, overlap=False) | |
| optimized_time = (time.perf_counter() - start_time) / num_runs | |
| # Benchmark reference implementation | |
| start_time = time.perf_counter() | |
| for _ in range(num_runs): | |
| _ = feature_to_block_reference(data, nnsk.idp, overlap=False) | |
| reference_time = (time.perf_counter() - start_time) / num_runs | |
| speedup = reference_time / optimized_time | |
| print("\nPerformance comparison:") | |
| print(f" Optimized: {optimized_time*1000:.3f} ms") | |
| print(f" Reference: {reference_time*1000:.3f} ms") | |
| print(f" Speedup: {speedup:.2f}x") | |
| # Assert that optimized version is faster (or at least not significantly slower) | |
| # Allow for some variance in timing measurements | |
| assert optimized_time <= reference_time * 1.1, \ | |
| f"Optimized implementation is slower: {optimized_time:.4f}s vs {reference_time:.4f}s" |
🧰 Tools
🪛 Ruff (0.14.14)
[error] 262-262: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@dptb/tests/test_feature_to_block_consistency.py` around lines 230 - 270, The
test_performance_improvement benchmark is flaky: switch timing from time.time()
to time.perf_counter() for higher-resolution clocks when measuring
optimized_time and reference_time; gate the test with a pytest skip/marker that
checks an env var (e.g., skip unless os.environ.get("RUN_PERF_TESTS") == "1") so
it only runs in perf-aware CI or locally; and change the unnecessary f-string
print(f"\nPerformance comparison:") to a normal print("\nPerformance
comparison:"). Update references to the measured functions feature_to_block and
feature_to_block_reference and the test_performance_improvement function
accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dptb/tests/test_block_to_feature.py`:
- Line 448: The print statement uses an unnecessary f-string: replace the call
to print(f"\nPerformance comparison:") with a normal string print("\nPerformance
comparison:") to remove the unused f-string prefix (locate the print in
dptb/tests/test_block_to_feature.py where the output line "Performance
comparison:" is printed).
🧹 Nitpick comments (2)
dptb/tests/test_block_to_feature.py (2)
335-371: Consider annotating mutable class attributes withClassVar.The static analyzer flagged that
common_options,model_options, anddata_optionsshould be annotated withtyping.ClassVarsince they are mutable class attributes. However, this pattern is consistent with the existingTestBlock2Featureclass above, so this is a minor nitpick.
436-456: Consider usingtime.perf_counter()and watch for CI flakiness.
time.perf_counter()provides higher resolution timing thantime.time()and is recommended for benchmarking.The 10% tolerance assertion (
optimized_time <= reference_time * 1.1) may be flaky in CI environments where system load varies. Consider either:
- Marking this test with
@pytest.mark.slowor similar to skip in CI- Removing the assertion and keeping only the informational print
- Increasing the tolerance
♻️ Suggested change for perf_counter
- start_time = time.time() + start_time = time.perf_counter() for _ in range(num_runs): _ = feature_to_block(data, nnsk.idp, overlap=False) - optimized_time = (time.time() - start_time) / num_runs + optimized_time = (time.perf_counter() - start_time) / num_runs # Benchmark reference implementation - start_time = time.time() + start_time = time.perf_counter() for _ in range(num_runs): _ = feature_to_block_reference(data, nnsk.idp, overlap=False) - reference_time = (time.time() - start_time) / num_runs + reference_time = (time.perf_counter() - start_time) / num_runs
| reference_time = (time.time() - start_time) / num_runs | ||
|
|
||
| speedup = reference_time / optimized_time | ||
| print(f"\nPerformance comparison:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary f-string prefix.
This f-string has no placeholders and should be a regular string.
🔧 Proposed fix
- print(f"\nPerformance comparison:")
+ print("\nPerformance comparison:")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"\nPerformance comparison:") | |
| print("\nPerformance comparison:") |
🧰 Tools
🪛 Ruff (0.14.14)
[error] 448-448: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@dptb/tests/test_block_to_feature.py` at line 448, The print statement uses an
unnecessary f-string: replace the call to print(f"\nPerformance comparison:")
with a normal string print("\nPerformance comparison:") to remove the unused
f-string prefix (locate the print in dptb/tests/test_block_to_feature.py where
the output line "Performance comparison:" is printed).
Summary by CodeRabbit
Release Notes
New Features
Performance
Tests