refact(dpmodel,pt_expt): embedding net#5205
refact(dpmodel,pt_expt): embedding net#5205wanghan-iapcm wants to merge 36 commits intodeepmodeling:masterfrom
Conversation
Summary of ChangesHello @wanghan-iapcm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed refactoring that converts EmbeddingNet from a factory-generated class to a concrete class, enabling a new, more robust pt_expt backend. The introduction of a registry-based auto-conversion mechanism is a great architectural improvement for maintainability. Additionally, this PR includes several important bug fixes related to thread management and device handling for GPU tensors, which greatly improves the correctness and robustness of the system. The changes are thoroughly tested, ensuring a high-quality contribution.
| intra_nthreads, inter_nthreads = get_default_nthreads() | ||
| if inter_nthreads > 0: # the behavior of 0 is not documented | ||
| torch.set_num_interop_threads(inter_nthreads) | ||
| # torch.set_num_interop_threads can only be called once per process. | ||
| # Guard to avoid RuntimeError when multiple backends are imported. | ||
| try: | ||
| if torch.get_num_interop_threads() != inter_nthreads: | ||
| torch.set_num_interop_threads(inter_nthreads) | ||
| except RuntimeError as err: | ||
| log.warning(f"Could not set torch interop threads: {err}") | ||
| if intra_nthreads > 0: | ||
| torch.set_num_threads(intra_nthreads) | ||
| # torch.set_num_threads can also fail if called after threads are created. | ||
| try: | ||
| if torch.get_num_threads() != intra_nthreads: | ||
| torch.set_num_threads(intra_nthreads) | ||
| except RuntimeError as err: | ||
| log.warning(f"Could not set torch intra threads: {err}") |
There was a problem hiding this comment.
This change fixes a critical bug where inter_nthreads and intra_nthreads were swapped, leading to incorrect thread settings for PyTorch. The addition of try...except blocks and checking the current thread counts before setting them also significantly improves the robustness of the environment setup, especially when multiple backends might be imported. Great improvement!
| except (ValueError, AttributeError, TypeError): | ||
| xp = array_api_compat.array_namespace(x) | ||
| # to fix BufferError: Cannot export readonly array since signalling readonly is unsupported by DLPack. | ||
| x = xp.asarray(x, copy=True) | ||
| # Move to CPU device to ensure numpy compatibility | ||
| x = xp.asarray(x, device="cpu", copy=True) | ||
| return np.from_dlpack(x) |
There was a problem hiding this comment.
📝 WalkthroughWalkthroughThe PR replaces dynamic factory-generated EmbeddingNet with a concrete class, introduces a registry-driven conversion system to wrap dpmodel objects as PyTorch modules, and adds PyTorch-compatible wrapper classes for descriptors, networks, and utilities to enable dpmodel integration with pt_expt. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant dpmodel as dpmodel Component
participant registry as Registry
participant converter as Converter
participant pt_expt as pt_expt Module
User->>dpmodel: Create dpmodel object (e.g., EmbeddingNet)
dpmodel-->>User: dpmodel instance
User->>registry: Call try_convert_module(dpmodel_instance)
registry->>registry: Look up registered converter for type
alt Converter registered
registry->>converter: Invoke converter callable
converter->>converter: Deserialize or instantiate
converter-->>pt_expt: Create pt_expt wrapper
pt_expt-->>registry: pt_expt module instance
registry-->>User: Return converted module
else Converter not registered
registry-->>User: Return None
end
alt Attribute assignment flow
User->>pt_expt: setattr(obj, name, dpmodel_value)
pt_expt->>pt_expt: __setattr__ calls dpmodel_setattr
dpmodel_setattr->>registry: try_convert_module(value)
registry-->>dpmodel_setattr: converted pt_expt module or None
alt Conversion succeeded
dpmodel_setattr-->>pt_expt: Use converted module instead
else Conversion failed or not registered
dpmodel_setattr->>pt_expt: Handle as numpy/torch/default
end
end
User->>pt_expt: Use wrapped module for torch training/export
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@deepmd/backend/pt_expt.py`:
- Around line 37-42: The class-level features tuple in pt_expt.py declares
Backend.Feature.DEEP_EVAL, NEIGHBOR_STAT, and IO but the corresponding property
methods deep_eval, neighbor_stat, serialize_hook, and deserialize_hook raise
NotImplementedError; either remove those flags from the features tuple or
implement the properties consistent with the patterns in
pytorch.py/tensorflow.py: provide a working deep_eval method, return a proper
neighbor_stat callable/object, and implement serialize_hook/deserialize_hook to
handle IO serialization for the backend, ensuring the symbols features,
deep_eval, neighbor_stat, serialize_hook, and deserialize_hook are updated
together so the advertised features match actual implementations.
In `@deepmd/dpmodel/utils/network.py`:
- Around line 889-896: The code crashes when obj.layers is empty because it does
type(obj.layers[0]); update cls.deserialize to guard that access: if obj.layers
is empty then if serialized layers (the local variable layers) are also empty
just return obj, otherwise determine the correct layer-deserializer from the
serialized data or a class-level fallback (e.g. a cls._layer_type or metadata
stored in each serialized layer) and use that to build obj.layers via obj.layers
= type(obj.layers)([layer_deserializer.deserialize(layer) for layer in layers]);
otherwise (obj.layers not empty) keep the existing logic using
type(obj.layers[0]) to preserve MRO behavior.
In `@deepmd/pt_expt/utils/env.py`:
- Around line 33-38: The fallback using os.cpu_count() can return None and cause
TypeError in NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus))):
ensure ncpus is a concrete int after the except path by handling None (e.g., set
ncpus = os.cpu_count(); if ncpus is None: ncpus = 1 or another safe default) so
min(4, ncpus) always receives an int; update the except block where ncpus is
assigned (reference symbols: os.sched_getaffinity, os.cpu_count, ncpus,
NUM_WORKERS) and keep NUM_WORKERS computation the same.
🧹 Nitpick comments (9)
deepmd/dpmodel/utils/network.py (2)
819-819: Mutable default argument forneuron.Ruff B006 flags this. While the same pattern exists in the factory version (line 714), it's worth noting that the list default
[24, 48, 96]is shared across all callers. If any caller mutates it (unlikely but possible), it would affect subsequent calls.Suggested fix
def __init__( self, in_dim: int, - neuron: list[int] = [24, 48, 96], + neuron: list[int] | None = None, activation_function: str = "tanh", resnet_dt: bool = False, precision: str = DEFAULT_PRECISION, seed: int | list[int] | None = None, bias: bool = True, trainable: bool | list[bool] = True, ) -> None: + if neuron is None: + neuron = [24, 48, 96] layers = []
791-853: Significant code duplication betweenEmbeddingNetand the factoryENclass.The concrete
EmbeddingNet(lines 791–896) largely duplicates the__init__,serialize, anddeserializeof the factory-producedENclass (lines 688–788). The factory is retained for backends like JAX (per relevant snippetdeepmd/jax/utils/network.py:68-69), so both must coexist. Consider whether the concrete class could inherit from the factory output (like JAX does) to reduce duplication, or extract shared logic into a mixin/helper.Also applies to: 688-788
source/tests/pt_expt/utils/test_common.py (1)
14-23: Tests cover the happy path well. Consider adding aNoneinput test.The
to_torch_arrayfunction also handlesNoneinput (returningNone), which isn't exercised here. A one-liner likeassert to_torch_array(None) is Nonewould complete the coverage for the three documented input branches.source/tests/pt_expt/utils/test_network.py (1)
163-164: Non-deterministic RNG in test — consider seeding for reproducibility.Unlike other tests in this file that use
GLOBAL_SEED, this RNG is unseeded. While correctness isn't affected (both backends receive the samex_np), a failing test would be hard to reproduce.Suggested fix
- rng = np.random.default_rng() + rng = np.random.default_rng(GLOBAL_SEED)source/tests/consistent/descriptor/test_se_r.py (1)
92-100: Unused variables inskip_pt_expt— matches existing pattern inskip_pt.Static analysis flags
resnet_dt,excluded_types, andprecisionas unused. This mirrors the identical pattern inskip_pt(Lines 83-90), so this is consistent with the codebase style. A future cleanup could use_prefixes for both properties, but that's out of scope here.source/tests/consistent/descriptor/test_se_e2_a.py (1)
546-566:eval_pt_exptusesenv.DEVICEfrom the PT backend, not from pt_expt.Lines 551–555 reference
env.DEVICEimported under theINSTALLED_PTguard (line 32–34). This works in practice because pt_expt likely depends on pt being installed, but it's conceptually wrong for a pt_expt test path to rely on PT's device. If the two backends ever diverge in device selection, this would silently use the wrong device.Consider importing pt_expt's env and using its
DEVICEinstead, or at minimum add a comment clarifying the intentional reuse.deepmd/pt_expt/common.py (1)
45-82: No duplicate-registration guard.
register_dpmodel_mappingsilently overwrites if called twice with the samedpmodel_cls. This is fine for the current module-level registration pattern, but if registration ever moves to a more dynamic context, a warning or error on duplicate keys would help catch bugs.deepmd/pt_expt/utils/env.py (1)
22-29:force=Trueonset_start_methodmay conflict with other backends.If both
deepmd.pt.utils.envanddeepmd.pt_expt.utils.envare imported in the same process (which happens in tests and when mixing backends),force=Truewill override whatever start method was previously set. The try/except mitigates runtime errors, but the forced override could cause subtle issues if another module intentionally set a different method.Consider using
force=Falseor checkingmultiprocessing.get_start_method(allow_none=True)first.deepmd/pt_expt/utils/network.py (1)
27-37: Remove unusednoqadirective.Ruff reports
PYI034is not enabled, so the# noqa: PYI034on line 28 is unnecessary.Proposed fix
- def __new__( # noqa: PYI034 + def __new__( cls, data: Any = None, requires_grad: bool = True ) -> "TorchArrayParam":
| features: ClassVar[Backend.Feature] = ( | ||
| Backend.Feature.ENTRY_POINT | ||
| | Backend.Feature.DEEP_EVAL | ||
| | Backend.Feature.NEIGHBOR_STAT | ||
| | Backend.Feature.IO | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the pt_expt.py file to see the complete implementation
cat -n deepmd/backend/pt_expt.pyRepository: deepmodeling/deepmd-kit
Length of output: 3606
🏁 Script executed:
#!/bin/bash
# Search for how Backend.Feature flags are actually used
rg -n 'hasattr.*Feature|\.features' --type=py -B2 -A2 | head -100Repository: deepmodeling/deepmd-kit
Length of output: 352
🏁 Script executed:
#!/bin/bash
# Check if there's a Backend class definition to understand Feature enum
find . -name "*.py" -type f | xargs grep -l "class Backend" | head -5Repository: deepmodeling/deepmd-kit
Length of output: 154
🏁 Script executed:
#!/bin/bash
# Get more context around line 92 in backend.py to see how features are used
sed -n '80,110p' deepmd/backend/backend.pyRepository: deepmodeling/deepmd-kit
Length of output: 975
🏁 Script executed:
#!/bin/bash
# Search for callers of deep_eval and neighbor_stat properties to see if they check features first
rg -n 'backend\.deep_eval|backend\.neighbor_stat' --type=py -B3 -A3 | head -80Repository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Check other backends to see if they implement these properties or declare features they don't support
find deepmd/backend -name "*.py" -type f | xargs grep -l "Backend.Feature" | head -10Repository: deepmodeling/deepmd-kit
Length of output: 271
🏁 Script executed:
#!/bin/bash
# Search for get_backends_by_feature usage to see how features are checked before accessing properties
rg -n 'get_backends_by_feature' --type=py -B2 -A5Repository: deepmodeling/deepmd-kit
Length of output: 2044
🏁 Script executed:
#!/bin/bash
# Check what other backends implement for deep_eval and neighbor_stat
for file in deepmd/backend/pytorch.py deepmd/backend/tensorflow.py deepmd/backend/dpmodel.py; do
echo "=== $file ==="
sed -n '1,20p' "$file"
echo "---"
grep -n 'deep_eval\|neighbor_stat\|serialize_hook\|deserialize_hook\|Backend.Feature' "$file" | head -20
doneRepository: deepmodeling/deepmd-kit
Length of output: 3048
🏁 Script executed:
#!/bin/bash
# Search for actual calls to .deep_eval or accessing backend properties
rg -n '\.deep_eval\(|\.neighbor_stat\(|serialize_hook|deserialize_hook' --type=py -B2 -A2 | head -100Repository: deepmodeling/deepmd-kit
Length of output: 5487
Remove undeclared feature flags or implement the properties.
features declares DEEP_EVAL, NEIGHBOR_STAT, and IO, but the corresponding properties (deep_eval, neighbor_stat, serialize_hook, deserialize_hook) all raise NotImplementedError. Code paths like deepmd/entrypoints/neighbor_stat.py will crash at runtime if a user selects "pt-expt" as the backend because it passes the feature filter but fails when accessing the property.
Either remove these feature flags until implementations are ready, or implement the properties (see how pytorch.py and tensorflow.py do it).
🤖 Prompt for AI Agents
In `@deepmd/backend/pt_expt.py` around lines 37 - 42, The class-level features
tuple in pt_expt.py declares Backend.Feature.DEEP_EVAL, NEIGHBOR_STAT, and IO
but the corresponding property methods deep_eval, neighbor_stat, serialize_hook,
and deserialize_hook raise NotImplementedError; either remove those flags from
the features tuple or implement the properties consistent with the patterns in
pytorch.py/tensorflow.py: provide a working deep_eval method, return a proper
neighbor_stat callable/object, and implement serialize_hook/deserialize_hook to
handle IO serialization for the backend, ensuring the symbols features,
deep_eval, neighbor_stat, serialize_hook, and deserialize_hook are updated
together so the advertised features match actual implementations.
| obj = cls(**data) | ||
| # Reinitialize layers from serialized data, using the same layer type | ||
| # that __init__ created (respects subclass overrides via MRO). | ||
| layer_type = type(obj.layers[0]) | ||
| obj.layers = type(obj.layers)( | ||
| [layer_type.deserialize(layer) for layer in layers] | ||
| ) | ||
| return obj |
There was a problem hiding this comment.
type(obj.layers[0]) will raise IndexError if neuron is empty.
If deserialize is called with data where neuron=[], __init__ produces an empty layers list, and line 892 crashes on obj.layers[0]. The factory version's deserialize (line 785) doesn't have this issue since it calls super().__init__(layers) directly.
Suggested fix
obj = cls(**data)
# Reinitialize layers from serialized data, using the same layer type
# that __init__ created (respects subclass overrides via MRO).
- layer_type = type(obj.layers[0])
- obj.layers = type(obj.layers)(
- [layer_type.deserialize(layer) for layer in layers]
- )
+ if obj.layers:
+ layer_type = type(obj.layers[0])
+ obj.layers = type(obj.layers)(
+ [layer_type.deserialize(layer) for layer in layers]
+ )
+ else:
+ obj.layers = type(obj.layers)([])
return obj📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| obj = cls(**data) | |
| # Reinitialize layers from serialized data, using the same layer type | |
| # that __init__ created (respects subclass overrides via MRO). | |
| layer_type = type(obj.layers[0]) | |
| obj.layers = type(obj.layers)( | |
| [layer_type.deserialize(layer) for layer in layers] | |
| ) | |
| return obj | |
| obj = cls(**data) | |
| # Reinitialize layers from serialized data, using the same layer type | |
| # that __init__ created (respects subclass overrides via MRO). | |
| if obj.layers: | |
| layer_type = type(obj.layers[0]) | |
| obj.layers = type(obj.layers)( | |
| [layer_type.deserialize(layer) for layer in layers] | |
| ) | |
| else: | |
| obj.layers = type(obj.layers)([]) | |
| return obj |
🤖 Prompt for AI Agents
In `@deepmd/dpmodel/utils/network.py` around lines 889 - 896, The code crashes
when obj.layers is empty because it does type(obj.layers[0]); update
cls.deserialize to guard that access: if obj.layers is empty then if serialized
layers (the local variable layers) are also empty just return obj, otherwise
determine the correct layer-deserializer from the serialized data or a
class-level fallback (e.g. a cls._layer_type or metadata stored in each
serialized layer) and use that to build obj.layers via obj.layers =
type(obj.layers)([layer_deserializer.deserialize(layer) for layer in layers]);
otherwise (obj.layers not empty) keep the existing logic using
type(obj.layers[0]) to preserve MRO behavior.
| try: | ||
| # only linux | ||
| ncpus = len(os.sched_getaffinity(0)) | ||
| except AttributeError: | ||
| ncpus = os.cpu_count() | ||
| NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus))) |
There was a problem hiding this comment.
os.cpu_count() can return None, causing TypeError in min(4, None).
On some platforms (or in restricted environments), os.cpu_count() returns None. The fallback on line 37 assigns None to ncpus, and then min(4, ncpus) on line 38 will raise a TypeError.
Proposed fix
try:
# only linux
ncpus = len(os.sched_getaffinity(0))
except AttributeError:
- ncpus = os.cpu_count()
+ ncpus = os.cpu_count() or 1
NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus)))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| # only linux | |
| ncpus = len(os.sched_getaffinity(0)) | |
| except AttributeError: | |
| ncpus = os.cpu_count() | |
| NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus))) | |
| try: | |
| # only linux | |
| ncpus = len(os.sched_getaffinity(0)) | |
| except AttributeError: | |
| ncpus = os.cpu_count() or 1 | |
| NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus))) |
🤖 Prompt for AI Agents
In `@deepmd/pt_expt/utils/env.py` around lines 33 - 38, The fallback using
os.cpu_count() can return None and cause TypeError in NUM_WORKERS =
int(os.environ.get("NUM_WORKERS", min(4, ncpus))): ensure ncpus is a concrete
int after the except path by handling None (e.g., set ncpus = os.cpu_count(); if
ncpus is None: ncpus = 1 or another safe default) so min(4, ncpus) always
receives an int; update the except block where ncpus is assigned (reference
symbols: os.sched_getaffinity, os.cpu_count, ncpus, NUM_WORKERS) and keep
NUM_WORKERS computation the same.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba8e7abfaa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| class PyTorchExportableBackend(Backend): | ||
| """PyTorch exportable backend.""" | ||
|
|
||
| name = "PyTorch Exportable" |
There was a problem hiding this comment.
Make pt-expt backend name key-compatible
main.py maps CLI aliases to backend.name.lower() and then validates the result against registered backend keys, so setting name to "PyTorch Exportable" makes --backend pt-expt resolve to "pytorch exportable", which is not a registered key (pt-expt/pytorch-exportable). This causes train/freeze-style commands to fail immediately with Unknown backend whenever users select this backend via CLI.
Useful? React with 👍 / 👎.
| features: ClassVar[Backend.Feature] = ( | ||
| Backend.Feature.ENTRY_POINT | ||
| | Backend.Feature.DEEP_EVAL | ||
| | Backend.Feature.NEIGHBOR_STAT | ||
| | Backend.Feature.IO |
There was a problem hiding this comment.
Remove unimplemented features from pt-expt backend
This backend advertises DEEP_EVAL, NEIGHBOR_STAT, and IO support, but the corresponding hooks (deep_eval, neighbor_stat, serialize_hook, deserialize_hook) currently raise NotImplementedError. Any workflow that selects the backend for .pte files (for example dp test, dp show, or convert-backend) will deterministically crash when those hooks are accessed, so the feature flags and suffix exposure are currently misleading and break user flows.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5205 +/- ##
==========================================
+ Coverage 81.99% 82.00% +0.01%
==========================================
Files 724 724
Lines 73807 73828 +21
Branches 3616 3617 +1
==========================================
+ Hits 60519 60546 +27
+ Misses 12124 12118 -6
Partials 1164 1164 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@source/tests/pt_expt/descriptor/test_se_t_tebd.py`:
- Around line 153-158: The test constructs an inputs tuple that passes
type_embedding as the 4th positional argument, which binds to the forward(...)
parameter extended_atype_embd rather than the intended type_embedding; update
the test in test_se_t_tebd.py so the call matches the forward signature — either
supply extended_atype_embd explicitly (e.g. None) and pass type_embedding as a
keyword (type_embedding=type_embedding) or pass both optional args by keyword
(extended_atype_embd=..., type_embedding=...) when calling the module's forward
method to remove the positional mismatch.
- Around line 58-59: The test assigns torch.Tensor values to dd0.davg and
dd0.dstd which bypass dpmodel_setattr (it only converts numpy arrays) and
therefore do not update the inner descriptor block used by serialization;
instead set the statistics on the inner block dd0.se_ttebd (e.g., via its
mapping interface) so serialization reads the intended values—replace
assignments to dd0.davg/dd0.dstd with assignments into dd0.se_ttebd (updating
"davg" and "dstd"), ensuring dpmodel_setattr and dd0.serialize will observe the
non-default statistics and the deserialized dd1 will reflect them.
🧹 Nitpick comments (2)
source/tests/consistent/descriptor/test_se_t.py (1)
99-107: Theskip_pt_exptproperty unpacksself.paramwithout using any of the unpacked variables.Ruff flags
resnet_dt,excluded_types,precision, andenv_protectionas unused. This mirrors the same pattern inskip_ptandskip_dp, so it's a pre-existing convention in this file. Consider prefixing with underscores if you want to silence the linter across all three properties in a follow-up.source/tests/pt_expt/descriptor/test_se_t_tebd.py (1)
66-71:type_embeddingkeyword is accepted but immediately deleted inforward.The descriptor computes its own type embedding internally via
self.type_embedding.call(). The externally passedtype_embeddingis discarded (seedel extended_atype_embd, mapping, type_embeddinginforward). This is not a test bug per se — the consistency check is still valid — but it means the test isn't exercising external type-embedding injection as the setup comments suggest.
| dd0.davg = torch.tensor(davg, dtype=dtype, device=self.device) | ||
| dd0.dstd = torch.tensor(dstd, dtype=dtype, device=self.device) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for dpmodel_setattr implementation in pt_expt/common.py
rg -n "def dpmodel_setattr" --type=py -A 20 deepmd/pt_expt/Repository: deepmodeling/deepmd-kit
Length of output: 1701
🏁 Script executed:
#!/bin/bash
# Check DescrptBlockSeTTebd implementation for __setitem__ and davg/dstd handling
rg -n "__setitem__|davg|dstd" --type=py deepmd/dpmodel/descriptor/se_t_tebd.py | head -50Repository: deepmodeling/deepmd-kit
Length of output: 1207
🏁 Script executed:
#!/bin/bash
# Check how other PT-EXPT tests set davg/dstd
find . -path "*pt_expt*" -name "*test*.py" -type f -exec rg -l "davg|dstd" {} \;Repository: deepmodeling/deepmd-kit
Length of output: 262
🏁 Script executed:
#!/bin/bash
# Look at the actual DescrptSeTTebd wrapper class in PT-EXPT
rg -n "class DescrptSeTTebd" --type=py -A 50 deepmd/pt_expt/ | grep -A 50 "pt_expt"Repository: deepmodeling/deepmd-kit
Length of output: 2935
🏁 Script executed:
#!/bin/bash
# Get the full dpmodel_setattr implementation to see if it routes davg/dstd
rg -n "def dpmodel_setattr" --type=py -A 100 deepmd/pt_expt/common.py | head -120Repository: deepmodeling/deepmd-kit
Length of output: 4596
🏁 Script executed:
#!/bin/bash
# Check if DescrptSeTTebd wrapper has __getitem__/__setitem__ for delegation
rg -n "def __getitem__|def __setitem__" --type=py deepmd/pt_expt/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Check how other PT-EXPT descriptor tests set davg/dstd (test_se_t.py, test_se_r.py, etc.)
rg -n "\.davg\s*=|\.dstd\s*=|\[\"davg\"\]|\.se_.*\[" --type=py -B 2 -A 2 source/tests/pt_expt/descriptor/test_se_t.py | head -60Repository: deepmodeling/deepmd-kit
Length of output: 705
🏁 Script executed:
#!/bin/bash
# Check the parent class DescrptSeTTebdDP in dpmodel to see if it has __getitem__/__setitem__
rg -n "class DescrptSeTTebdDP|def __getitem__|def __setitem__" --type=py deepmd/dpmodel/descriptor/se_t_tebd.py | head -30Repository: deepmodeling/deepmd-kit
Length of output: 179
🏁 Script executed:
#!/bin/bash
# Check if DescrptSeTTebd (wrapper) initializes davg/dstd as buffers
rg -n "register_buffer.*davg|self.davg|self.dstd" --type=py deepmd/pt_expt/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Check DescrptSeTTebdDP (parent class in dpmodel) initialization for davg/dstd
rg -n "def __init__|self.davg|self.dstd|se_ttebd\[" --type=py -A 5 deepmd/dpmodel/descriptor/se_t_tebd.py | head -80Repository: deepmodeling/deepmd-kit
Length of output: 1092
🏁 Script executed:
#!/bin/bash
# Verify how deserialization in dpmodel properly sets davg/dstd on inner block
rg -n "def restore_from_dict|se_ttebd\[\"davg\"\]|se_ttebd\[\"dstd\"\]" --type=py -B 3 -A 3 deepmd/dpmodel/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 879
🏁 Script executed:
#!/bin/bash
# Check if DescrptSeTTebdDP has `@property` for davg/dstd that routes to se_ttebd
rg -n "@property|def davg|def dstd" --type=py deepmd/dpmodel/descriptor/se_t_tebd.py | head -30Repository: deepmodeling/deepmd-kit
Length of output: 156
🏁 Script executed:
#!/bin/bash
# Check the initialization of DescrptSeTTebdDP to see if davg/dstd buffers are registered
rg -n "def __init__" --type=py -A 100 deepmd/dpmodel/descriptor/se_t_tebd.py | grep -A 100 "class DescrptSeTTebdDP" | head -120Repository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Verify what the test setup actually initializes - check if se_ttebd is created with stats
rg -n "se_ttebd.*=" --type=py -B 2 -A 2 deepmd/dpmodel/descriptor/se_t_tebd.py | head -50Repository: deepmodeling/deepmd-kit
Length of output: 1339
🏁 Script executed:
#!/bin/bash
# Check what those `@property` decorators are for (lines 252, 256, 646, 651, 656)
sed -n '252,260p; 256,264p; 646,654p; 651,659p; 656,664p' deepmd/dpmodel/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 1315
🏁 Script executed:
#!/bin/bash
# Check if davg/dstd are `@property` in DescrptSeTTebdDP
rg -n "@property" --type=py -A 2 deepmd/dpmodel/descriptor/se_t_tebd.py | grep -A 2 "davg\|dstd"Repository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Look for `@property` definitions more carefully
ast-grep --pattern $'@property
def $_($_) {
$$$
}' deepmd/dpmodel/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Check if DescrptSeTTebdDP has __getattr__ or __setattr__ that routes davg/dstd
rg -n "def __getattr__|def __setattr__" --type=py -A 10 deepmd/dpmodel/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# More direct check - look for "davg" or "dstd" in class DescrptSeTTebdDP
sed -n '100,300p' deepmd/dpmodel/descriptor/se_t_tebd.py | rg -n "davg|dstd|property"Repository: deepmodeling/deepmd-kit
Length of output: 185
🏁 Script executed:
#!/bin/bash
# Check the full __init__ method of DescrptSeTTebdDP
sed -n '123,300p' deepmd/dpmodel/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 6322
🏁 Script executed:
#!/bin/bash
# Check what happens when the test runs - trace the actual behavior
# Look for how se_ttebd is accessed in the forward method
rg -n "self.se_ttebd\[" --type=py deepmd/dpmodel/descriptor/se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Final verification: trace exactly what dpmodel_setattr does with torch.Tensor
# Show the exact logic path
cat deepmd/pt_expt/common.py | sed -n '199,218p'Repository: deepmodeling/deepmd-kit
Length of output: 762
🏁 Script executed:
#!/bin/bash
# Check if DescrptSeTTebd wrapper has `@property` davg/dstd that delegates
rg -n "def davg|def dstd" --type=py deepmd/pt_expt/descriptor/Repository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Verify test behavior: check if setter actually works or if it's creating new attributes
# Look at what the test expects after setting davg/dstd
sed -n '55,80p' source/tests/pt_expt/descriptor/test_se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 1277
🏁 Script executed:
#!/bin/bash
# Verify the critical detail: does davg/dstd get used in forward computation?
# Check the DescrptBlockSeTTebd forward to see where mean/stddev are accessed
rg -n "self.mean|self.stddev" --type=py -B 2 -A 2 deepmd/dpmodel/descriptor/se_t_tebd.py | grep -A 2 -B 2 "mean\|stddev" | head -40Repository: deepmodeling/deepmd-kit
Length of output: 1631
🏁 Script executed:
#!/bin/bash
# Check what serialize() actually serializes - does it include davg/dstd from outer wrapper?
rg -n "def serialize" --type=py -A 30 deepmd/dpmodel/descriptor/se_t_tebd.py | head -50Repository: deepmodeling/deepmd-kit
Length of output: 2228
🏁 Script executed:
#!/bin/bash
# Final confirmation: verify the actual test path - what stats do dd0 and dd1 actually use in forward?
sed -n '54,82p' source/tests/pt_expt/descriptor/test_se_t_tebd.pyRepository: deepmodeling/deepmd-kit
Length of output: 1380
dd0.davg and dd0.dstd assignments do not propagate to the inner descriptor block's statistics.
When dd0.davg = torch.tensor(...) is assigned, dpmodel_setattr does not handle torch.Tensor values (it only converts numpy.ndarray). The assignment falls through to standard attribute creation on the outer wrapper, leaving dd0.se_ttebd.mean and dd0.se_ttebd.stddev at their defaults (zeros/ones). During serialization, dd0.serialize() reads statistics from the inner block (not the outer attributes), so the random values are lost. Both dd0 and the deserialized dd1 use identical default statistics, which is why the test passes—but it doesn't actually test with non-trivial statistics as intended.
Assign directly to the inner block instead:
dd0.se_ttebd["davg"] = torch.tensor(davg, dtype=dtype, device=self.device)
dd0.se_ttebd["dstd"] = torch.tensor(dstd, dtype=dtype, device=self.device)🤖 Prompt for AI Agents
In `@source/tests/pt_expt/descriptor/test_se_t_tebd.py` around lines 58 - 59, The
test assigns torch.Tensor values to dd0.davg and dd0.dstd which bypass
dpmodel_setattr (it only converts numpy arrays) and therefore do not update the
inner descriptor block used by serialization; instead set the statistics on the
inner block dd0.se_ttebd (e.g., via its mapping interface) so serialization
reads the intended values—replace assignments to dd0.davg/dd0.dstd with
assignments into dd0.se_ttebd (updating "davg" and "dstd"), ensuring
dpmodel_setattr and dd0.serialize will observe the non-default statistics and
the deserialized dd1 will reflect them.
| inputs = ( | ||
| torch.tensor(self.coord_ext, dtype=dtype, device=self.device), | ||
| torch.tensor(self.atype_ext, dtype=int, device=self.device), | ||
| torch.tensor(self.nlist, dtype=int, device=self.device), | ||
| type_embedding, | ||
| ) |
There was a problem hiding this comment.
Positional argument mismatch: type_embedding tensor is passed as extended_atype_embd.
The forward signature is (extended_coord, extended_atype, nlist, extended_atype_embd=None, mapping=None, type_embedding=None). Passing type_embedding as the 4th positional arg maps it to extended_atype_embd, not type_embedding. Since forward deletes all optional args and computes type embedding internally, this doesn't cause a runtime error, but it's semantically misleading for a test.
Consider using a keyword argument or a tuple that matches the signature:
Proposed fix
inputs = (
torch.tensor(self.coord_ext, dtype=dtype, device=self.device),
torch.tensor(self.atype_ext, dtype=int, device=self.device),
torch.tensor(self.nlist, dtype=int, device=self.device),
- type_embedding,
+ None, # extended_atype_embd
+ None, # mapping
+ type_embedding, # type_embedding
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inputs = ( | |
| torch.tensor(self.coord_ext, dtype=dtype, device=self.device), | |
| torch.tensor(self.atype_ext, dtype=int, device=self.device), | |
| torch.tensor(self.nlist, dtype=int, device=self.device), | |
| type_embedding, | |
| ) | |
| inputs = ( | |
| torch.tensor(self.coord_ext, dtype=dtype, device=self.device), | |
| torch.tensor(self.atype_ext, dtype=int, device=self.device), | |
| torch.tensor(self.nlist, dtype=int, device=self.device), | |
| None, # extended_atype_embd | |
| None, # mapping | |
| type_embedding, # type_embedding | |
| ) |
🤖 Prompt for AI Agents
In `@source/tests/pt_expt/descriptor/test_se_t_tebd.py` around lines 153 - 158,
The test constructs an inputs tuple that passes type_embedding as the 4th
positional argument, which binds to the forward(...) parameter
extended_atype_embd rather than the intended type_embedding; update the test in
test_se_t_tebd.py so the call matches the forward signature — either supply
extended_atype_embd explicitly (e.g. None) and pass type_embedding as a keyword
(type_embedding=type_embedding) or pass both optional args by keyword
(extended_atype_embd=..., type_embedding=...) when calling the module's forward
method to remove the positional mismatch.
…le backend." This reverts commit 621c7cc.
What does it mean? What's the benefit of this PR? |
EmbeddingNet Refactoring: Factory Function to Concrete Class
Summary
This refactoring converts
EmbeddingNetfrom a factory-generated dynamic class to a concrete class in the dpmodel backend. This change enables the auto-detection registry mechanism in pt_expt to work seamlessly with EmbeddingNet attributes.This PR is considered after #5194 and #5204
Motivation
Before:
EmbeddingNetwas created by a factory functionmake_embedding_network(NativeNet, NativeLayer), producing a dynamically-typed classmake_embedding_network.<locals>.EN. This caused two problems:name == "embedding_net"in__setattr__instead of using the type-based auto-detection mechanismAfter:
EmbeddingNetis now a concrete class that can be registered in the pt_expt auto-conversion registry, eliminating the need for name-based special cases.Changes
1. dpmodel: Concrete
EmbeddingNetclassFile:
deepmd/dpmodel/utils/network.pyEmbeddingNet(NativeNet)class__init__deserializeto usetype(obj.layers[0])instead of hardcodingsuper(EmbeddingNet, obj), allowing pt_expt subclass to preserve its converted torch layersmake_embedding_networkfactory for pt/pd backends that use different base classes (MLP)2. pt_expt: Wrapper and registration
File:
deepmd/pt_expt/utils/network.pyEmbeddingNet(EmbeddingNetDP, torch.nn.Module)wrapperNativeLayer(torch modules) in__init__3. TypeEmbedNet: Simplified to use registry
File:
deepmd/pt_expt/utils/type_embed.pyembedding_netcheck in__setattr__dpmodel_setattrwhich auto-converts via registrynetworkmodule to ensureEmbeddingNetregistration happens firstTests
dpmodel tests
File:
source/tests/common/dpmodel/test_network.pyAdded to
TestEmbeddingNetclass:test_is_concrete_class: VerifiesEmbeddingNetis now a concrete class, not factory outputtest_forward_pass: Tests dpmodel forward pass produces correct shapestest_trainable_parameter_variants: Tests different trainable configurations (all trainable, all frozen, mixed)(The existing
test_embedding_nettest already covers serialization/deserialization round-trip)pt_expt integration tests
File:
source/tests/pt_expt/utils/test_network.pyCreated
TestEmbeddingNetRefactortest suite with 8 tests:test_pt_expt_embedding_net_wraps_dpmodel: Verifies pt_expt wrapper inherits correctly and converts layerstest_pt_expt_embedding_net_forward: Tests pt_expt forward pass returns torch.Tensortest_serialization_round_trip_pt_expt: Tests pt_expt serialize/deserializetest_deserialize_preserves_layer_type: Tests the key fix -deserializeusestype(obj.layers[0])to preserve pt_expt's torch layerstest_cross_backend_consistency: Tests numerical consistency between dpmodel and pt_expttest_registry_converts_dpmodel_to_pt_expt: Teststry_convert_moduleauto-converts dpmodel to pt_expttest_auto_conversion_in_setattr: Testsdpmodel_setattrauto-converts EmbeddingNet attributestest_trainable_parameter_handling: Tests trainable vs frozen parameters work correctly in pt_exptVerification
All tests pass:
Benefits
__setattr__Backward Compatibility
make_embedding_networkfactory kept for pt/pd backendsFiles Changed
Modified
deepmd/dpmodel/utils/network.py: Concrete EmbeddingNet class + deserialize fixdeepmd/pt_expt/utils/network.py: EmbeddingNet wrapper + registrationdeepmd/pt_expt/utils/type_embed.py: Simplified to use registrysource/tests/common/dpmodel/test_network.py: Added dpmodel EmbeddingNet tests (3 new tests)source/tests/pt_expt/utils/test_network.py: Added pt_expt integration tests (8 new tests)No changes required
Summary by CodeRabbit
Release Notes
New Features
Improvements
Tests