refact(dpmodel,pt_expt): fitting net#5207
refact(dpmodel,pt_expt): fitting net#5207wanghan-iapcm wants to merge 37 commits intodeepmodeling:masterfrom
Conversation
Summary of ChangesHello @wanghan-iapcm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| tebd_dim, | ||
| tebd_input_mode, | ||
| resnet_dt, | ||
| excluded_types, |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| precision, | ||
| use_econf_tebd, | ||
| use_tebd_bias, | ||
| ) = self.param |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| use_econf_tebd, | ||
| use_tebd_bias, | ||
| ) = self.param | ||
| return CommonTest.skip_pt_expt |
Check notice
Code scanning / CodeQL
Unused local variable Note test
| use_tebd_bias, | ||
| ) = self.param | ||
| return CommonTest.skip_pt_expt | ||
|
|
Check notice
Code scanning / CodeQL
Unused local variable Note test
| ) = self.param | ||
| return CommonTest.skip_pt_expt | ||
|
|
||
| @property |
Check notice
Code scanning / CodeQL
Unused local variable Note test
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed refactoring. It successfully converts FittingNet to a concrete class, mirroring the previous EmbeddingNet refactoring, which is a great step towards a more consistent and maintainable codebase. The introduction of the pt_expt backend with its automatic wrapping mechanism is a clever design that will simplify future development. The changes to make the code more backend-agnostic using array_api_compat and ensuring device consistency are also excellent. The added tests are comprehensive and provide good coverage for the new features. I've found a couple of minor areas for improvement, which I've detailed in the comments.
| dtype=des["dstd"].dtype, | ||
| device=array_api_compat.device(des["dstd"]), | ||
| ) | ||
| xp = array_api_compat.array_namespace(des["davg"]) |
| else: | ||
| pass |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e263270ca2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| class PyTorchExportableBackend(Backend): | ||
| """PyTorch exportable backend.""" | ||
|
|
||
| name = "PyTorch Exportable" |
There was a problem hiding this comment.
Use a registered key as backend canonical name
Setting name to "PyTorch Exportable" breaks CLI backend selection because main_parser() canonicalizes --backend/--pt-expt to backend.name.lower(), yielding "pytorch exportable", but deepmd/main.py only accepts registered backend keys (e.g. pt-expt, pytorch-exportable). As a result, invoking dp --pt-expt ... or dp --backend pt-expt ... fails early with Unknown backend before dispatch.
Useful? React with 👍 / 👎.
| type[DeepEvalBackend] | ||
| The Deep Eval backend of the backend. | ||
| """ | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Stop advertising unsupported deep-eval and IO hooks
This backend is marked with DEEP_EVAL/NEIGHBOR_STAT/IO and .pte suffix support, so dispatcher paths can select it (for example .pte model detection in DeepEval), but deep_eval, neighbor_stat, and serialization hooks still raise NotImplementedError. That makes backend-selected inference/inspection flows fail at runtime instead of gracefully reporting unsupported capabilities.
Useful? React with 👍 / 👎.
| obj = cls(**data) | ||
| # Reinitialize layers from serialized data, using the same layer type | ||
| # that __init__ created (respects subclass overrides via MRO). | ||
| layer_type = type(obj.layers[0]) |
There was a problem hiding this comment.
Handle empty embedding layers in deserialize
EmbeddingNet.deserialize() now indexes obj.layers[0] to infer layer_type, but EmbeddingNet.__init__ allows neuron=[], which creates an empty layer list. Deserializing such a serialized network now raises IndexError before loading layers, whereas the previous implementation handled empty-layer round trips by reinitializing directly from serialized layers.
Useful? React with 👍 / 👎.
📝 WalkthroughWalkthroughIntroduces a PyTorch exportable backend (pt_expt) with device-aware array operations in core descriptors. Refactors network factories into explicit classes with serialization support. Adds wrappers converting dpmodel components to PyTorch modules via registry-based mapping. Changes
Sequence Diagram(s)sequenceDiagram
participant DP as DPModel Descriptor
participant Reg as Registry (common.py)
participant Conv as Converter Function
participant PT as PT Expt Module
participant PyTorch as PyTorch Framework
DP->>Reg: register_dpmodel_mapping(DP_class, converter)
Note over Reg: Store mapping: DP_class → converter
DP->>Reg: try_convert_module(dp_instance)
Reg->>Reg: Check if dp_instance type in registry
Reg->>Conv: Found converter, call it
Conv->>PT: Create PT_expt wrapper with dual inheritance
PT->>PT: Initialize DP base + torch.nn.Module
PT->>PT: Register buffers/parameters via dpmodel_setattr
Reg-->>Conv: PT_expt module
Conv-->>Reg: Converted module
Reg-->>DP: PT Expt module returned
DP->>PT: Assign to attribute (via dpmodel_setattr)
PT->>PyTorch: Synchronize with PyTorch state_dict
PyTorch-->>PT: Module registered for export/device movement
sequenceDiagram
participant App as Application
participant PT as PT Expt Descriptor
participant DP as DPModel Compute
participant XP as Array API Backend
App->>PT: forward(extended_coord, extended_atype, nlist, ...)
PT->>PT: __call__ routes through nn.Module.__call__
PT->>DP: self.call(extended_coord, extended_atype, nlist, mapping)
DP->>XP: xp.asarray(array, device=target_device, dtype=target_dtype)
XP-->>DP: Device-aware array
DP->>XP: xp.take_along_axis(..., indices.astype(xp.int64))
XP-->>DP: Gathered values
DP-->>PT: (descrpt, rot_mat, g2, h2, sw)
PT-->>App: Return 5-tuple of torch.Tensor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepmd/dpmodel/descriptor/dpa1.py (1)
921-927:⚠️ Potential issue | 🟡 MinorInitialize
self.stats = NoneinDescrptBlockSeAtten.__init__to preventAttributeError.
self.statsis only assigned incompute_input_stats(line 909) and is never initialized in__init__. Ifget_statsis called beforecompute_input_stats, the checkif self.stats is None:on line 923 will raiseAttributeErrorinstead of the intendedRuntimeError.Proposed fix
Add
self.stats = NoneinDescrptBlockSeAtten.__init__(after line 845):self.orig_sel = self.sel + self.stats = None
🤖 Fix all issues with AI agents
In `@deepmd/backend/pt_expt.py`:
- Around line 37-42: The class-level features tuple declares
Backend.Feature.DEEP_EVAL, NEIGHBOR_STAT, and IO but the corresponding
properties deep_eval, neighbor_stat, serialize_hook, and deserialize_hook raise
NotImplementedError; either remove those three flags from the features variable
so the backend is not advertised for those entrypoints, or implement the missing
methods (deep_eval(), neighbor_stat(), serialize_hook()/deserialize_hook()) to
provide the expected behavior; update the class-level features tuple (features)
and/or implement the named properties/methods (deep_eval, neighbor_stat,
serialize_hook, deserialize_hook) accordingly so get_backends_by_feature and
convert_backend no longer hit NotImplementedError.
In `@deepmd/dpmodel/utils/network.py`:
- Around line 889-896: EmbeddingNet.deserialize raises IndexError when
obj.layers is empty because it accesses obj.layers[0]; change the logic in
EmbeddingNet.deserialize to first check if obj.layers and layers are non-empty
before reading type(obj.layers[0]). If both are non-empty, keep the existing
behavior (layer_type = type(obj.layers[0]) and reconstruct obj.layers =
type(obj.layers)([layer_type.deserialize(layer) for layer in layers]));
otherwise, set obj.layers = type(obj.layers)([]) or simply use
type(obj.layers)([layer.deserialize(...) for layer in layers]) without indexing
when layers contains serialized entries, and when both are empty leave
obj.layers as an empty sequence. Ensure you update the code path in
EmbeddingNet.deserialize (the block that defines layer_type and assigns
obj.layers) to implement this guard.
In `@source/tests/consistent/descriptor/test_se_e2_a.py`:
- Around line 546-566: The test method eval_pt_expt uses env.DEVICE which is
only imported when INSTALLED_PT is true, causing a potential NameError when
INSTALLED_PT_EXPT is true but INSTALLED_PT is false; change the test to use a
guarded common symbol (e.g. PT_DEVICE) or add a local fallback: import or
reference PT_DEVICE from the shared descriptor test helper (which is available
when INSTALLED_PT or INSTALLED_PT_EXPT is true) or resolve DEVICE at the top of
this test module with a conditional that falls back to a safe default before
calling eval_pt_expt, and replace direct uses of env.DEVICE in eval_pt_expt with
PT_DEVICE (or the fallback symbol) to remove the hidden dependency on env and
INSTALLED_PT.
🧹 Nitpick comments (9)
source/tests/pt/test_env_threads.py (1)
21-28: Consider using pytest'scaplogfixture instead of manually patchinglogging.Logger.warning.The manual monkey-patch of
logging.Logger.warningworks but is fragile and non-idiomatic. pytest's built-incaplogfixture is designed for this:def test_env_threads_guard_handles_runtimeerror(monkeypatch, caplog) -> None: ... with caplog.at_level(logging.WARNING): importlib.reload(env) assert any("Could not set torch interop threads" in r.message for r in caplog.records) assert any("Could not set torch intra threads" in r.message for r in caplog.records)deepmd/dpmodel/utils/network.py (1)
816-826: Mutable default argument forneuron.Both
EmbeddingNet.__init__(line 819) andFittingNet.__init__(line 1042) use a mutable list as a default argument (neuron: list[int] = [24, 48, 96]). This is a well-known Python gotcha (Ruff B006). It's a pre-existing pattern inherited from the factory functions (lines 714, 930), so this may be deferred, but worth noting.♻️ Idiomatic fix (applied to both classes)
def __init__( self, in_dim: int, - neuron: list[int] = [24, 48, 96], + neuron: list[int] | None = None, ... ) -> None: + if neuron is None: + neuron = [24, 48, 96]Also applies to: 1038-1048
deepmd/pt_expt/descriptor/__init__.py (1)
3-3: Unusednoqadirective (Ruff RUF100).Ruff reports that
F401is not enabled, so the# noqa: F401directive is unnecessary. That said, it does document the intent that this is a side-effect import. You could remove it to silence the linter or suppress RUF100 instead — either way is fine.deepmd/pt_expt/utils/type_embed.py (1)
14-15: Side-effect import is necessary for registration ordering — good.The comment clearly explains why
networkmust be imported beforeTypeEmbedNetis used. Same Ruff RUF100noqanit as noted in__init__.py— the directive is technically unnecessary if F401 isn't enabled, but it documents intent.source/tests/pt_expt/descriptor/test_se_t.py (1)
65-65: Nit: prefix unused unpacked variables with_.Static analysis flags
gr1(Line 65) andgr2(Line 85) as unused. Sincese_treturnsNonefor these, prefixing with_silences the warning.- rd1, gr1, _, _, sw1 = dd1( + rd1, _gr1, _, _, sw1 = dd1(- rd2, gr2, _, _, sw2 = dd2.call( + rd2, _gr2, _, _, sw2 = dd2.call(source/tests/pt_expt/utils/test_network.py (1)
286-404:TestFittingNetRefactorhas less coverage thanTestEmbeddingNetRefactor.
TestEmbeddingNetRefactorincludestest_cross_backend_consistency,test_deserialize_preserves_layer_type,test_auto_conversion_in_setattr, andtest_trainable_parameter_handling— none of which have counterparts here. Consider adding at leasttest_cross_backend_consistencyandtest_trainable_parameter_handlingfor FittingNet to ensure parity.Also,
FittingNetis imported locally in each method whileEmbeddingNetis imported at the top of the file — consider hoisting theFittingNetimport to the top for consistency.deepmd/pt_expt/utils/env.py (1)
19-20: Minor:import torchplaced after logger setup.The
torchimport on line 20 sits between the logger initialization and the code that uses it (line 22 onwards). Consider moving it up to the import block (after line 7) for standard import ordering.deepmd/pt_expt/common.py (1)
45-82: No duplicate-registration guard.
register_dpmodel_mappingsilently overwrites if called with the samedpmodel_clstwice. This is probably fine for the current use case (registrations happen at module import time), but a debug-level log or assertion on duplicate keys could help catch accidental re-registrations during development.deepmd/pt_expt/utils/network.py (1)
27-37: Remove unusednoqadirective.Ruff reports
PYI034is not enabled, so the# noqa: PYI034suppression on line 28 is unnecessary.🧹 Proposed fix
- def __new__( # noqa: PYI034 + def __new__( cls, data: Any = None, requires_grad: bool = True ) -> "TorchArrayParam":
| features: ClassVar[Backend.Feature] = ( | ||
| Backend.Feature.ENTRY_POINT | ||
| | Backend.Feature.DEEP_EVAL | ||
| | Backend.Feature.NEIGHBOR_STAT | ||
| | Backend.Feature.IO | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp/repo && cat -n deepmd/backend/pt_expt.py | head -120Repository: deepmodeling/deepmd-kit
Length of output: 126
🏁 Script executed:
# Check how Backend.Feature flags are used to gate method calls
rg -n "features\s*&" --type=py -A2 -B2 | head -100Repository: deepmodeling/deepmd-kit
Length of output: 352
🏁 Script executed:
# Search for places where deep_eval, neighbor_stat, serialize_hook, deserialize_hook are called
rg -n "\.deep_eval\(|\.neighbor_stat\(|\.serialize_hook\(|\.deserialize_hook\(" --type=py -B3 | head -150Repository: deepmodeling/deepmd-kit
Length of output: 350
🏁 Script executed:
# Check other backend implementations to see if similar patterns exist
fd "backend.*\.py$" --type=f | grep -E "(pt|cpu|gpu|tensorflow)" | head -5Repository: deepmodeling/deepmd-kit
Length of output: 103
🏁 Script executed:
cat -n deepmd/backend/backend.py | sed -n '80,105p'Repository: deepmodeling/deepmd-kit
Length of output: 946
🏁 Script executed:
cat -n deepmd/backend/pt_expt.pyRepository: deepmodeling/deepmd-kit
Length of output: 3606
🏁 Script executed:
cat -n deepmd/utils/update_sel.py | sed -n '110,130p'Repository: deepmodeling/deepmd-kit
Length of output: 833
🏁 Script executed:
# Check if pt backend implements neighbor_stat
rg -n "def neighbor_stat" deepmd/backend/pt.py -A5Repository: deepmodeling/deepmd-kit
Length of output: 127
🏁 Script executed:
# Check the get_backends and how backends are selected
rg -n "get_backends\|Backend.get" deepmd/backend/backend.py -A3 -B1 | head -80Repository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
# Find all backend implementations
fd "backend.*\.py$" deepmd/backend --type fRepository: deepmodeling/deepmd-kit
Length of output: 92
🏁 Script executed:
# Search for get_backends_by_feature usage
rg -n "get_backends_by_feature" --type=py -B2 -A5Repository: deepmodeling/deepmd-kit
Length of output: 2044
🏁 Script executed:
# Search for calls to deep_eval, neighbor_stat that might gate on features
rg -n "backend\.deep_eval|backend\.neighbor_stat|backend\.serialize_hook|backend\.deserialize_hook" --type=py -B3 -A1Repository: deepmodeling/deepmd-kit
Length of output: 1401
🏁 Script executed:
# Check if serialize_hook and deserialize_hook are actually used anywhere
rg -n "serialize_hook|deserialize_hook" --type=py | grep -v "def serialize_hook\|def deserialize_hook\|raise NotImplementedError" | head -20Repository: deepmodeling/deepmd-kit
Length of output: 410
🏁 Script executed:
# Check the neighbor_stat entrypoint command
cat -n deepmd/entrypoints/neighbor_stat.py | sed -n '50,85p'Repository: deepmodeling/deepmd-kit
Length of output: 1345
🏁 Script executed:
# Check if there's a deep_eval entrypoint
rg -n "get_backends_by_feature.*DEEP_EVAL" --type=py -B2 -A8Repository: deepmodeling/deepmd-kit
Length of output: 49
🏁 Script executed:
# Check what happens when a .pte file is detected
cat -n deepmd/backend/backend.py | sed -n '96,120p'Repository: deepmodeling/deepmd-kit
Length of output: 1120
🏁 Script executed:
# Verify what suffixes pt-expt has and if they're used for backend detection
rg -n "\.pte|suffixes" deepmd/backend/pt_expt.pyRepository: deepmodeling/deepmd-kit
Length of output: 156
Remove unimplemented feature declarations or implement the methods.
The pt-expt backend declares DEEP_EVAL, NEIGHBOR_STAT, and IO features, but the corresponding properties (deep_eval, neighbor_stat, serialize_hook, deserialize_hook) all raise NotImplementedError. This creates real runtime failures:
- The
neighbor_statentrypoint (deepmd/entrypoints/neighbor_stat.py:75-80) callsget_backends_by_feature(Backend.Feature.NEIGHBOR_STAT), which includespt-expt. If a user specifies--backend pt-expt, the call tobackend_obj.neighbor_statwill crash. - The
convert_backendentrypoint (deepmd/entrypoints/convert_backend.py:28-29) detects backends by file suffix. Sincept-expthas suffix.pte, loading or saving.ptefiles will crash when accessingserialize_hookordeserialize_hook.
Either remove these feature flags until the methods are implemented, or provide implementations.
🤖 Prompt for AI Agents
In `@deepmd/backend/pt_expt.py` around lines 37 - 42, The class-level features
tuple declares Backend.Feature.DEEP_EVAL, NEIGHBOR_STAT, and IO but the
corresponding properties deep_eval, neighbor_stat, serialize_hook, and
deserialize_hook raise NotImplementedError; either remove those three flags from
the features variable so the backend is not advertised for those entrypoints, or
implement the missing methods (deep_eval(), neighbor_stat(),
serialize_hook()/deserialize_hook()) to provide the expected behavior; update
the class-level features tuple (features) and/or implement the named
properties/methods (deep_eval, neighbor_stat, serialize_hook, deserialize_hook)
accordingly so get_backends_by_feature and convert_backend no longer hit
NotImplementedError.
| obj = cls(**data) | ||
| # Reinitialize layers from serialized data, using the same layer type | ||
| # that __init__ created (respects subclass overrides via MRO). | ||
| layer_type = type(obj.layers[0]) | ||
| obj.layers = type(obj.layers)( | ||
| [layer_type.deserialize(layer) for layer in layers] | ||
| ) | ||
| return obj |
There was a problem hiding this comment.
IndexError when neuron is empty in EmbeddingNet.deserialize.
If neuron=[], obj.layers will be an empty list after cls(**data), so type(obj.layers[0]) on line 892 raises IndexError. The same pattern in FittingNet.deserialize is safe because the output layer always exists.
🐛 Proposed fix: guard against empty layers
obj = cls(**data)
# Reinitialize layers from serialized data, using the same layer type
# that __init__ created (respects subclass overrides via MRO).
- layer_type = type(obj.layers[0])
- obj.layers = type(obj.layers)(
- [layer_type.deserialize(layer) for layer in layers]
- )
+ if obj.layers:
+ layer_type = type(obj.layers[0])
+ obj.layers = type(obj.layers)(
+ [layer_type.deserialize(layer) for layer in layers]
+ )
return obj🤖 Prompt for AI Agents
In `@deepmd/dpmodel/utils/network.py` around lines 889 - 896,
EmbeddingNet.deserialize raises IndexError when obj.layers is empty because it
accesses obj.layers[0]; change the logic in EmbeddingNet.deserialize to first
check if obj.layers and layers are non-empty before reading type(obj.layers[0]).
If both are non-empty, keep the existing behavior (layer_type =
type(obj.layers[0]) and reconstruct obj.layers =
type(obj.layers)([layer_type.deserialize(layer) for layer in layers]));
otherwise, set obj.layers = type(obj.layers)([]) or simply use
type(obj.layers)([layer.deserialize(...) for layer in layers]) without indexing
when layers contains serialized entries, and when both are empty leave
obj.layers as an empty sequence. Ensure you update the code path in
EmbeddingNet.deserialize (the block that defines layer_type and assigns
obj.layers) to implement this guard.
| def eval_pt_expt(self, pt_expt_obj: Any) -> Any: | ||
| pt_expt_obj.compute_input_stats( | ||
| [ | ||
| { | ||
| "r0": None, | ||
| "coord": torch.from_numpy(self.coords) | ||
| .reshape(-1, self.natoms[0], 3) | ||
| .to(env.DEVICE), | ||
| "atype": torch.from_numpy(self.atype.reshape(1, -1)).to(env.DEVICE), | ||
| "box": torch.from_numpy(self.box.reshape(1, 3, 3)).to(env.DEVICE), | ||
| "natoms": self.natoms[0], | ||
| } | ||
| ] | ||
| ) | ||
| return self.eval_pt_expt_descriptor( | ||
| pt_expt_obj, | ||
| self.natoms, | ||
| self.coords, | ||
| self.atype, | ||
| self.box, | ||
| ) |
There was a problem hiding this comment.
eval_pt_expt in TestSeAStat depends on env.DEVICE which is only imported under INSTALLED_PT.
env is imported at line 33 inside if INSTALLED_PT:, but eval_pt_expt (line 553) references env.DEVICE. If INSTALLED_PT_EXPT is True while INSTALLED_PT is False, this will raise a NameError. In practice this is unlikely since both backends require torch, but it's a latent coupling.
Consider importing PT_DEVICE from the common descriptor test helper (which guards on INSTALLED_PT or INSTALLED_PT_EXPT) or adding a local fallback:
if INSTALLED_PT_EXPT:
from deepmd.pt_expt.descriptor.se_e2_a import DescrptSeA as DescrptSeAPTExpt
+ if not INSTALLED_PT:
+ from deepmd.pt.utils import env
else:
DescrptSeAPTExpt = None🤖 Prompt for AI Agents
In `@source/tests/consistent/descriptor/test_se_e2_a.py` around lines 546 - 566,
The test method eval_pt_expt uses env.DEVICE which is only imported when
INSTALLED_PT is true, causing a potential NameError when INSTALLED_PT_EXPT is
true but INSTALLED_PT is false; change the test to use a guarded common symbol
(e.g. PT_DEVICE) or add a local fallback: import or reference PT_DEVICE from the
shared descriptor test helper (which is available when INSTALLED_PT or
INSTALLED_PT_EXPT is true) or resolve DEVICE at the top of this test module with
a conditional that falls back to a safe default before calling eval_pt_expt, and
replace direct uses of env.DEVICE in eval_pt_expt with PT_DEVICE (or the
fallback symbol) to remove the hidden dependency on env and INSTALLED_PT.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@deepmd/dpmodel/utils/type_embed.py`:
- Around line 210-222: The random array generation uses
np.random.default_rng().random(..., dtype=PRECISION_DICT[self.precision]) which
fails for float16/half; change the generation to always request an ndarray with
a supported NumPy dtype (e.g., np.float32 or np.float64) and then cast via
xp.asarray(..., dtype=first_layer_matrix.dtype,
device=array_api_compat.device(first_layer_matrix)) before concatenation—update
the code paths involving extend_type_params, PRECISION_DICT/self.precision,
xp.asarray, first_layer_matrix, and xp.concat accordingly; optionally, if
reproducibility is required, create or accept a seeded RNG instance instead of
calling np.random.default_rng() inline.
🧹 Nitpick comments (2)
deepmd/dpmodel/utils/network.py (2)
813-850: Mutable default argument forneuron.Ruff B006 flags
neuron: list[int] = [24, 48, 96]. While the list isn't mutated here (a new list is built each time), the same pattern exists in the old factory function, so this is pre-existing. Consider switching toNonewith an internal default if you want to clean it up.♻️ Suggested fix
def __init__( self, in_dim: int, - neuron: list[int] = [24, 48, 96], + neuron: list[int] | None = None, activation_function: str = "tanh", resnet_dt: bool = False, precision: str = DEFAULT_PRECISION, seed: int | list[int] | None = None, bias: bool = True, trainable: bool | list[bool] = True, ) -> None: + if neuron is None: + neuron = [24, 48, 96] layers = []
1035-1078: Same mutable defaultneuron(B006) — same optional cleanup applies.Same Ruff B006 as
EmbeddingNet. Theelse: passon lines 1051-1052 is a no-op and could be dropped for clarity, but this mirrors the existing factory code.
| # Create random params with same dtype and device as first_layer_matrix | ||
| extend_type_params = np.random.default_rng().random( | ||
| [len(type_map), first_layer_matrix.shape[-1]], | ||
| dtype=PRECISION_DICT[self.precision], | ||
| ) | ||
| extend_type_params = xp.asarray( | ||
| extend_type_params, | ||
| dtype=first_layer_matrix.dtype, | ||
| device=array_api_compat.device(first_layer_matrix), | ||
| ) | ||
| first_layer_matrix = np.concatenate( | ||
| first_layer_matrix = xp.concat( | ||
| [first_layer_matrix, extend_type_params], axis=0 | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find PRECISION_DICT definition
rg "PRECISION_DICT\s*=" --type py -A 5Repository: deepmodeling/deepmd-kit
Length of output: 3275
🏁 Script executed:
#!/bin/bash
# Check imports in type_embed.py
head -50 deepmd/dpmodel/utils/type_embed.py | grep -E "^import|^from"Repository: deepmodeling/deepmd-kit
Length of output: 319
🏁 Script executed:
#!/bin/bash
# Check full imports from deepmd.dpmodel.common
sed -n '1,50p' deepmd/dpmodel/utils/type_embed.py | grep -A 10 "from deepmd.dpmodel.common import"Repository: deepmodeling/deepmd-kit
Length of output: 306
🌐 Web query:
numpy random.default_rng().random() dtype support float16
💡 Result:
np.random.default_rng().random(..., dtype=...) does not support float16. The dtype argument for Generator.random is documented as supporting only float64 and float32 (native byte order). [1], [2]
Workaround: generate float32 (or float64) and cast:
rng = np.random.default_rng()
x16 = rng.random(size=..., dtype=np.float32).astype(np.float16)(NumPy won’t generate float16 uniforms natively via random.) [1], [5]
Sources:
[1] NumPy docs: numpy.random.Generator.random (v1.20)
[2] NumPy docs: numpy.random.Generator.random (v2.4.dev)
[5] Stack Overflow discussion noting only float32/float64 are accepted
🏁 Script executed:
#!/bin/bash
# Check the class/method containing this code and how self.precision is used
sed -n '180,230p' deepmd/dpmodel/utils/type_embed.pyRepository: deepmodeling/deepmd-kit
Length of output: 2385
🏁 Script executed:
#!/bin/bash
# Search for float16 usage in tests or with TypeEmbedding
rg "precision.*float16|float16.*precision" deepmd/dpmodel --type py -B 2 -A 2Repository: deepmodeling/deepmd-kit
Length of output: 728
Fix dtype handling in random generation to avoid runtime errors with float16 precision.
-
np.random.default_rng().random(..., dtype=...)only supportsfloat32andfloat64. PassingPRECISION_DICT[self.precision]will raise aTypeErrorif precision is"float16"or"half"since those map tonp.float16, which is unsupported. Generate asfloat32/float64and cast if needed, or usefirst_layer_matrix.dtypedirectly (which the code already converts to afterward). -
np.random.default_rng()creates an unseeded RNG on every call. If reproducibility is needed during type-map changes, consider passing a seed or using a seeded generator instance.
🤖 Prompt for AI Agents
In `@deepmd/dpmodel/utils/type_embed.py` around lines 210 - 222, The random array
generation uses np.random.default_rng().random(...,
dtype=PRECISION_DICT[self.precision]) which fails for float16/half; change the
generation to always request an ndarray with a supported NumPy dtype (e.g.,
np.float32 or np.float64) and then cast via xp.asarray(...,
dtype=first_layer_matrix.dtype,
device=array_api_compat.device(first_layer_matrix)) before concatenation—update
the code paths involving extend_type_params, PRECISION_DICT/self.precision,
xp.asarray, first_layer_matrix, and xp.concat accordingly; optionally, if
reproducibility is required, create or accept a seeded RNG instance instead of
calling np.random.default_rng() inline.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5207 +/- ##
==========================================
+ Coverage 81.99% 82.03% +0.03%
==========================================
Files 724 728 +4
Lines 73807 73943 +136
Branches 3616 3615 -1
==========================================
+ Hits 60519 60659 +140
+ Misses 12124 12121 -3
+ Partials 1164 1163 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
FittingNet Refactoring: Factory Function to Concrete Class
Summary
This refactoring converts
FittingNetfrom a factory-generated dynamic class to a concrete class in the dpmodel backend, following the same pattern as the EmbeddingNet refactoring. This enables the auto-detection registry mechanism in pt_expt to work seamlessly with FittingNet.This PR is considered after #5194 and #5204
Motivation
Before:
FittingNetwas created by a factory functionmake_fitting_network(EmbeddingNet, NativeNet, NativeLayer), producing a dynamically-typed class. This caused:make_fitting_networkcreates a new class type, so registry lookup by type failsAfter:
FittingNetis now a concrete class that can be registered in the pt_expt auto-conversion registry.Changes
1. dpmodel: Concrete
FittingNetclassFile:
deepmd/dpmodel/utils/network.pyFittingNet(EmbeddingNet)class__init__deserializeto usetype(obj.layers[0])instead of hardcodingT_Network.__init__(obj, layers), allowing pt_expt subclass to preserve its converted torch layersmake_fitting_networkfactory for backwards compatibility (for pt/pd backends)2. pt_expt: Wrapper and registration
File:
deepmd/pt_expt/utils/network.pyfrom deepmd.dpmodel.utils.network import FittingNet as FittingNetDPFittingNet(FittingNetDP, torch.nn.Module)wrapperNativeLayer(torch modules) in__init__Tests
dpmodel tests
File:
source/tests/common/dpmodel/test_network.pyAdded to
TestFittingNetclass:test_fitting_net: Original roundtrip serialization test (already existed)test_is_concrete_class: VerifiesFittingNetis now a concrete class, not factory outputtest_forward_pass: Tests dpmodel forward pass produces correct output shapes (single and batch)test_trainable_parameter_variants: Tests different trainable configurations (all trainable, all frozen, mixed)pt_expt integration tests
File:
source/tests/pt_expt/utils/test_network.pyCreated
TestFittingNetRefactortest suite with 4 tests:test_pt_expt_fitting_net_wraps_dpmodel: Verifies pt_expt wrapper inherits correctly and converts layerstest_pt_expt_fitting_net_forward: Tests pt_expt forward pass returns torch.Tensor with correct shapetest_serialization_round_trip_pt_expt: Tests pt_expt serialize/deserialize round-triptest_registry_converts_dpmodel_to_pt_expt: Teststry_convert_moduleauto-converts dpmodel to pt_exptVerification
All tests pass:
Benefits
Backward Compatibility
make_fitting_networkfactory kept for pt/pd backendsFiles Changed
Modified
deepmd/dpmodel/utils/network.py: Concrete FittingNet class + deserialize fixdeepmd/pt_expt/utils/network.py: FittingNet wrapper + registrationsource/tests/common/dpmodel/test_network.py: Added dpmodel FittingNet tests (3 new tests)source/tests/pt_expt/utils/test_network.py: Added pt_expt integration tests (4 new tests)Pattern
This refactoring follows the exact same pattern as
EMBEDDING_NET_REFACTOR.md:deserializeto usetype(obj.layers[0])__init__register_dpmodel_mappingSummary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests