-
Notifications
You must be signed in to change notification settings - Fork 45
Add some preliminary guesser docs #409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+273
−72
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
a44e6cd
add some preliminary guesser docs
lilyminium 9359bd9
fix ipython blocks
lilyminium 2ccfa87
Apply suggestions from code review
orbeckst d08031e
link to Guesser API
orbeckst 699421e
Update guessing.rst
orbeckst 55248e7
Apply suggestions from code review
orbeckst af32219
fix ipython block
lilyminium 625b3f1
add changes from review
lilyminium 2301819
add link to changelog
lilyminium File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| .. _default-guesser: | ||
|
|
||
| ============== | ||
| DefaultGuesser | ||
| ============== | ||
|
|
||
|
|
||
| .. warning:: | ||
|
|
||
| The default guesser has been created to reproduce pre-v2.8.0 MDAnalysis guessing behaviour as much as possible. However, minor changes were unavoidable and have been detailed in the `2.8.0 CHANGELOG notes <https://github.com/MDAnalysis/mdanalysis/blob/develop/package/CHANGELOG>`_ . **Default behaviours will change in MDAnalysis version 3**, as detailed in deprecation warnings. | ||
|
|
||
|
|
||
| The :class:`~MDAnalysis.guesser.default_guesser.DefaultGuesser` is the default guessing context for MDAnalysis. For historical reasons the ``DefaultGuesser`` largely works with biological conventions; for example, an atom named CA will be assumed to be a carbon rather than a calcium atom. | ||
|
|
||
|
|
||
| Attributes guessed | ||
| ================== | ||
|
|
||
| The topology attributes guessed by the default guesser are listed below, as are their dependencies and broad assumptions. | ||
| Please see the `Guesser API documentation`_ for more details. | ||
|
|
||
| .. _`Guesser API documentation`: https://docs.mdanalysis.org/stable/documentation_pages/guesser_modules/default_guesser.html | ||
|
|
||
| .. _default-guesser-types: | ||
|
|
||
| ------------------ | ||
| Elements and types | ||
| ------------------ | ||
|
|
||
| The default guesser guesses atom ``element``\ s and ``type``\ s using the same pathway; when atom ``type``\ s are guessed, they represent the atom ``element``. Atom elements are guessed from the atom name. The default guesser follows biological naming conventions, where atoms named "CA" are much more likely to represent an alpha-carbon than a calcium atom. This guesser is still relatively fragile for non-traditionally biological atom names. | ||
|
|
||
| The :meth:`~MDAnalysis.guesser.default_guesser.DefaultGuesser.guess_atom_element` method is used to guess atom elements or types following a process by which numbers, symbols, and some letters are stripped from the atom name and checked against a look-up table, as detailed in the `Guesser API documentation`_. With this method, for example, "AO5*" would be guessed as "O", and "3hg2" as "H". | ||
|
|
||
|
|
||
| ------ | ||
| Masses | ||
| ------ | ||
|
|
||
| Masses are guessed by using a look-up table to get masses from the atom's ``element`` attribute. If ``element``\ s are not available, the atom's ``type`` is used in place of the element. If the ``type`` is not available, that is | ||
| :ref:`guessed first <default-guesser-types>`. | ||
|
|
||
|
|
||
| .. warning:: | ||
|
|
||
| When an atom mass cannot be guessed from the atom ``type`` or ``name``, the atom is currently assigned a mass of 0.0. | ||
|
|
||
| Masses are guessed atom-by-atom, so even if most atoms have been guessed correctly, it is possible that some have been given masses of 0. It is important to check for non-zero masses before using methods that rely on them, such as :meth:`AtomGroup.center_of_mass`. | ||
|
|
||
|
|
||
| .. important:: | ||
|
|
||
| `np.nan` will be used as a default or "missing" value | ||
| in place of 0.0 for atom masses in version 3.0 of MDAnalysis. | ||
|
|
||
|
|
||
| ------------- | ||
| Aromaticities | ||
| ------------- | ||
|
|
||
| These are guessed using the :ref:`RDKit <RDKit-format>` converter by using the ``GetIsAromatic`` method. | ||
|
|
||
orbeckst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| .. note:: | ||
| RDKit needs to have been installed for aromaticity guessing to be available. | ||
| RDKit is always installed when MDAnalysis was installed with conda-forge packages | ||
| but this may not be the case when using other installation paths. | ||
|
|
||
| ----------------------------------- | ||
| Bonds, Angles, Dihedrals, Impropers | ||
| ----------------------------------- | ||
|
|
||
| MDAnalysis can guess if bonds exist between two atoms, based on the distance between them. A bond is created if the 2 atoms are within | ||
|
|
||
| .. math:: | ||
|
|
||
| d < f \cdot (R_1 + R_2) | ||
|
|
||
| of each other, where :math:`R_1` and :math:`R_2` are the van der Waals radii | ||
| of the atoms and :math:`f` is an ad-hoc *fudge factor*. This is | ||
| the `same algorithm that VMD uses`_. | ||
|
|
||
| .. note:: | ||
|
|
||
| Previously, guessing bonds would also guess angles and dihedrals. This is no longer the case. Angles, dihedrals, and impropers are not guessed by the default guesser unless | ||
| explicitly requested by the user. | ||
|
|
||
|
|
||
| Angles can be guessed from the bond connectivity. MDAnalysis assumes that if atoms 1 & 2 are bonded, and 2 & 3 are bonded, then (1,2,3) must be an angle. | ||
|
|
||
| :: | ||
|
|
||
| 1 | ||
| \ | ||
| 2 -- 3 | ||
|
|
||
| Dihedral angles and improper dihedrals can both be guessed from angles. Proper dihedrals are guessed by assuming that if (1,2,3) is an angle, and 3 & 4 are bonded, then (1,2,3,4) must be a dihedral. | ||
|
|
||
| :: | ||
|
|
||
| 1 4 | ||
| \ / | ||
| 2 -- 3 | ||
|
|
||
| Likewise, if (1,2,3) is an angle, and 2 & 4 are bonded, then (2, 1, 3, 4) must be an improper dihedral (i.e. the improper dihedral is the angle between the planes formed by (1, 2, 3) and (1, 3, 4)) | ||
|
|
||
| :: | ||
|
|
||
| 1 | ||
| \ | ||
| 2 -- 3 | ||
| / | ||
| 4 | ||
|
|
||
|
|
||
| .. _`same algorithm that VMD uses`: | ||
| http://www.ks.uiuc.edu/Research/vmd/vmd-1.9.1/ug/node26.html | ||
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| .. -*- coding: utf-8 -*- | ||
| .. _guessing: | ||
|
|
||
| ============================ | ||
| Guessing Topology Attributes | ||
| ============================ | ||
|
|
||
| Since version 2.8.0 MDAnalysis has introduced a new context-dependent guessing API to guess topology attributes that are not read from the file. This allows topology attributes such as masses, charges, and atom types to be guessed from existing information in a context-dependent manner (e.g. biological naming conventions) rather than file formats, as was previously done. | ||
|
|
||
| .. list-table:: Supported guesser contexts | ||
| :widths: 25 25 50 | ||
| :header-rows: 1 | ||
|
|
||
| * - Guesser | ||
| - Context name | ||
| - Topology attributes guessed | ||
| * - :ref:`DefaultGuesser <default-guesser>` | ||
| - "default" | ||
| - elements, types, masses, bonds, angles, dihedrals, impropers, aromaticities | ||
|
|
||
|
|
||
| Guessing at Universe creation | ||
| ============================= | ||
|
|
||
| Topology attributes can be guessed at Universe creation by passing in topology attributes to guess to the ``to_guess`` keyword. By default, as of version 2.8.0, the default guesser is used to guess ``types`` and ``masses``. | ||
|
|
||
|
|
||
| .. ipython:: python | ||
|
|
||
| import MDAnalysis as mda | ||
| from MDAnalysis.tests.datafiles import PRM12 | ||
| u = mda.Universe(PRM12, context="default", to_guess=["types", "masses", "bonds"]) | ||
| u.atoms.bonds | ||
|
|
||
|
|
||
| In general, guessing at Universe creation works very similarly to guessing using the :ref:`guess_TopologyAttrs method<guess-topologyAttrs>` interface documented below. The main difference is that passing guesser-specific keyword arguments such as ``fudge_factor`` and ``vdwradii`` into Universe creation is **now deprecated and will be removed in version 3.0**. Instead, we recommend specifying these arguments through an explicit call to the :meth:`~MDAnalysis.core.universe.Universe.guess_TopologyAttrs`. | ||
|
|
||
| .. _guess-topologyAttrs: | ||
|
|
||
| Guessing using the ``guess_TopologyAttrs()`` interface | ||
| ====================================================== | ||
|
|
||
| Topology attributes can also be guessed after :class:`~MDAnalysis.core.universe.Universe` creation using the :meth:`~MDAnalysis.core.universe.Universe.guess_TopologyAttrs` method. The ``to_guess``, ``force_guess``, and ``context`` keywords are used to specify which attributes to guess, which attributes to forcibly re-guess, and which guesser to use, respectively. These three keywords perform the same way here as they do in Universe creation. | ||
|
|
||
| As with :class:`Universe` creation, the :ref:`DefaultGuesser <default-guesser>` is used as the default ``context``. The following example demonstrates how to guess atom types and masses after Universe creation. | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u = mda.Universe(PRM12, to_guess=[]) # in v2.8.0 masses and types are guessed by default | ||
| u.guess_TopologyAttrs(to_guess=["types", "masses"]) | ||
| u.atoms.types | ||
|
|
||
|
|
||
| The context can be specified either using a string (e.g., ``"default"``) or an already created *Guesser* object (which will have been derived from the base class :class:`~MDAnalysis.guesser.base.GuesserBase`). It may be convenient to pass in an already-created *Guesser* object (such as the :class:`~MDAnalysis.guesser.default_guesser.DefaultGuesser`) if there are particular keywords you want to use in guessing methods, such as the ``fudge_factor``, ``vdwradii`` or ``lower_bound`` keywords for controlling bond guessing. However, if additional keyword arguments are passed into :meth:`~MDAnalysis.core.universe.Universe.guess_TopologyAttrs`, they will **replace** any existing arguments inside the guesser. | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| from MDAnalysis.guesser import DefaultGuesser | ||
| from MDAnalysis.tests.datafiles import CONECT # example data file | ||
|
|
||
| u = mda.Universe(CONECT) | ||
| guesser = DefaultGuesser(u, fudge_factor=1.2) | ||
| u.guess_TopologyAttrs(to_guess=["bonds"], context=guesser, fudge_factor=0.5) | ||
| guesser._kwargs["fudge_factor"] | ||
|
|
||
|
|
||
| -------------------- | ||
| Forcibly re-guessing | ||
| -------------------- | ||
|
|
||
| MDAnalysis will preferentially read topology attributes from file instead of re-guessing them, even if the attribute is passed into ``to_guess``. For example, below, the ``types`` attributes reflects the actual atom types in the file. | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u = mda.Universe(PRM12, to_guess=["types", "masses"]) | ||
| u.atoms.types | ||
|
|
||
| .. note:: | ||
|
|
||
| In cases where the attribute is only present for *some* atoms in the file (e.g. a patchy element column in a PDB), MDAnalysis will only guess the attribute for atoms where it is not present in the file. | ||
|
|
||
| To force MDAnalysis to re-guess a TopologyAttr, pass in the attribute to the ``force_guess`` keyword. This will force MDAnalysis to guess the attribute even if it is present in the file. | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u.guess_TopologyAttrs(to_guess=["types"], force_guess=["types"]) | ||
| u.atoms.types | ||
|
|
||
|
|
||
| ------------------------------------ | ||
| Guessing bonds, angles, and torsions | ||
| ------------------------------------ | ||
|
|
||
| Whereas most attributes are guessed at the atom, residue, or segment level, guessing topology objects such as bonds, angles, dihedrals and impropers behaves somewhat differently, and interacts with the ``force_guess`` keyword specially. | ||
|
|
||
| Specifically, if these connectivity attributes are guessed, they are by default guessed **additively**. Therefore, if bonds and other objects are guessed twice, **the bonds of the second guess are added on.** Below, we see the number of bonds increase when guessed again with a looser criteria. | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| from MDAnalysis.tests.datafiles import CONECT | ||
|
|
||
| u = mda.Universe(CONECT, to_guess=["bonds"]) | ||
| print(len(u.bonds)) | ||
| u.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=1.2) # looser | ||
| print(len(u.bonds)) | ||
|
|
||
|
|
||
| However, the **number of bonds doesn't change when the bonds are guessed again with stricter criteria** -- no new bonds are found (and also no bonds are removed either, even if they do not match the new criteria): | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=0.5) # stricter | ||
| print(len(u.bonds)) | ||
|
|
||
|
|
||
| Moreover, bonds are unique, so if the bonds are guessed again with the same criteria, the guessed bonds don't change: | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u.guess_TopologyAttrs(to_guess=["bonds"], fudge_factor=0.5) # same | ||
| print(len(u.bonds)) | ||
|
|
||
|
|
||
| However, if you want to forcibly overwrite all existing bonds, angles, dihedrals or impropers, you can pass the object to the ``force_guess`` keyword. This will **remove all existing objects of that type before guessing.** Below, we see the number of bonds has shrunk when guessed with stricter criteria: | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u.guess_TopologyAttrs(to_guess=["bonds"], force_guess=["bonds"], fudge_factor=0.5) | ||
| print(len(u.bonds)) | ||
|
|
||
|
|
||
| ----------------- | ||
| Order of guessing | ||
| ----------------- | ||
|
|
||
| The order of the attributes guessed can matter in some cases. For example, bond guessing with the :class:`~MDAnalysis.guesser.default_guesser.DefaultGuesser` relies on looking up the vdW radii of the atoms involved by their atom ``type``. That means that for file formats where the atom ``type`` is not a valid element, the atom ``type`` must be forcefully re-guessed for bond-guessing to work. | ||
|
|
||
| .. note:: | ||
|
|
||
| The behaviour of looking up radii by *type* will likely change to looking up by *element* in version 3.0. | ||
|
|
||
| Therefore the following will not work (in MDAnalysis < 3.0) due to the types encoded in the PSF file: | ||
|
|
||
| .. ipython:: python | ||
| :okexcept: | ||
|
|
||
| from MDAnalysis.tests.datafiles import PSF, DCD | ||
| u = mda.Universe(PSF, DCD) | ||
| print(u.atoms.types) | ||
| u.guess_TopologyAttrs(to_guess=["bonds"]) | ||
|
|
||
| However, the snippet below will re-guess the types, and now bond-guessing can work as the elements have vdW radii defined: | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| u.guess_TopologyAttrs(to_guess=["types", "bonds"], force_guess=["types"]) | ||
| print(u.atoms.types) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.