5.11. PDB structure files in MDAnalysis — MDAnalysis.coordinates.PDB
¶
MDAnalysis reads coordinates from PDB files and additional optional data such as B-factors. It is also possible to substitute a PDB file instead of PSF file in order to define the list of atoms (but no connectivity information will be available in this case).
PDB files contain both coordinate and atom information. It is also possible to write trajectories as multi-frame (or multi-model) PDB files. This is not very space efficient but is sometimes the lowest common denominator for exchanging trajectories. Single frame and multi-frame PDB files are automatically recognized; the only difference is thath the single-frame PDB is represented as a trajectory with only one frame.
In order to write a selection to a PDB file one typically uses the
MDAnalysis.core.AtomGroup.AtomGroup.write()
method of the selection:
calphas = universe.select_atoms("name CA")
calphas.write("calpha_only.pdb")
This uses the coordinates from the current timestep of the trajectory.
In order to write a PDB trajectory one needs to first obtain a multi-frame
writer (keyword multiframe = True
) and then iterate through the
trajectory, while writing each frame:
calphas = universe.select_atoms("name CA")
W = MDAnalysis.Writer("calpha_traj.pdb", multiframe=True)
for ts in u.trajectory:
W.write(calphas)
W.close()
It is important to always close the trajectory when done because only at this step is the final END record written, which is required by the PDB standard.
5.11.1. Implementations¶
Two different implementations of PDB I/O are available: the
“permissive” and the “strict” Reader/Writers.
The default are the “permissive” ones but this can be changed by setting the
flag “permissive_pdb_reader” in MDAnalysis.core.flags
(see
Flags) to False
:
MDAnalysis.core.flags["permissive_pdb_reader"] = False
The default for MDAnalysis is to use the
“permissive” PrimitivePDBReader
and
PrimitivePDBWriter
, corresponding to
MDAnalysis.core.flags["permissive_pdb_reader"] = True
On a case-by-case basis one kind of reader can be selected with the
permissive keyword to Universe
, e.g.
u = MDAnalysis.Universe(PDB, permissive=False)
would select PDBReader
instead of the default
PrimitivePDBReader
.
5.11.2. Simple (permissive) PDB Reader and Writer¶
A pure-Python implementation for PDB files commonly encountered in MD
simulations comes under the names PrimitivePDBReader
and
PrimitivePDBWriter
. It only implements a subset of the PDB standard
(for instance, it does not deal with insertion codes) and also allows some
typical enhancements such as 4-letter resids (introduced by CHARMM/NAMD). The
“primitive PDB Reader/Writer” are the default in MDAnalysis (equivalent to
supplying the permissive = True
keyword to
Universe
).
The PrimitivePDBReader
can read multi-frame PDB files and represents
them as a trajectory. The PrimitivePDBWriter
can write single and
multi-frame PDB files as specified by the multiframe keyword. By default, it
writes single frames. On the other hand, the MultiPDBWriter
is set up
to write a PDB trajectory by default (equivalent to using multiframe =
True
).
Examples
In order to write a multi-frame PDB trajectory from a universe u one can do the following:
pdb = MDAnalysis.Writer("all.pdb", multiframe=True)
for ts in u.trajectory:
pdb.write(u)
pdb.close()
Similarly, writing only a protein:
pdb = MDAnalysis.Writer("protein.pdb", multiframe=True)
protein = u.select_atoms("protein")
for ts in u.trajectory:
pdb.write(protein)
pdb.close()
A single frame can be written with the
write()
method of any
AtomGroup
:
protein = u.select_atoms("protein")
protein.write("protein.pdb")
Alternatively, get the single frame writer and supply the
AtomGroup
:
pdb = MDAnalysis.Writer("protein.pdb")
protein = u.select_atoms("protein")
pdb.write(protein)
pdb.close()
5.11.2.1. Classes¶
-
class
MDAnalysis.coordinates.PDB.
PrimitivePDBReader
(filename, **kwargs)[source]¶ PDBReader that reads a PDB-formatted file, no frills.
The following PDB records are parsed (see PDB coordinate section for details):
- CRYST1 for unitcell A,B,C, alpha,beta,gamma
- ATOM or HETATM for serial,name,resName,chainID,resSeq,x,y,z,occupancy,tempFactor
- HEADER (
header
), TITLE (title
), COMPND (compound
), REMARK (remarks
) - all other lines are ignored
Reads multi-MODEL PDB files as trajectories.
COLUMNS DATA TYPE FIELD DEFINITION 1 - 6 Record name “CRYST1” 7 - 15 Real(9.3) a a (Angstroms). 16 - 24 Real(9.3) b b (Angstroms). 25 - 33 Real(9.3) c c (Angstroms). 34 - 40 Real(7.2) alpha alpha (degrees). 41 - 47 Real(7.2) beta beta (degrees). 48 - 54 Real(7.2) gamma gamma (degrees). 1 - 6 Record name “ATOM “ 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. IGNORED 18 - 21 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. IGNORED 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. 61 - 66 Real(6.2) tempFactor Temperature factor. 67 - 76 String segID (unofficial CHARMM extension ?) 77 - 78 LString(2) element Element symbol, right-justified. IGNORED 79 - 80 LString(2) charge Charge on the atom. IGNORED See also
PrimitivePDBWriter
;PDBReader
implements a larger subset of the header records, which are accessible asPDBReader.metadata
.Changed in version 0.11.0: * Frames now 0-based instead of 1-based * New
title
(list with all TITLE lines).Read coordinates from filename.
filename can be a gzipped or bzip2ed compressed PDB file.
If the pdb file contains multiple MODEL records then it is read as a trajectory where the MODEL numbers correspond to frame numbers. Therefore, the MODEL numbers must be a sequence of integers (typically starting at 1 or 0).
-
Writer
(filename, **kwargs)[source]¶ Returns a permissive (simple) PDBWriter for filename.
Parameters: *filename* – filename of the output PDB file Returns: PrimitivePDBWriter
-
class
MDAnalysis.coordinates.PDB.
PrimitivePDBWriter
(filename, bonds='conect', n_atoms=None, start=0, step=1, remarks='Created by PrimitivePDBWriter', convert_units=None, multiframe=None)[source]¶ PDB writer that implements a subset of the PDB 3.2 standard .
PDB format as used by NAMD/CHARMM: 4-letter resnames and segID are allowed, altLoc is written.
The
PrimitivePDBWriter
can be used to either a dump a coordinate set to a PDB file (operating as a “single frame writer”, selected with the constructor keyword multiframe =False
, the default) or by writing a PDB “movie” (multi frame mode, multiframe =True
), consisting of multiple models (using the MODEL and ENDMDL records).See also
This class is identical to
MultiPDBWriter
with the one exception that it defaults to writing single-frame PDB files as if multiframe =False
was selected.Changed in version 0.7.5: Initial support for multi-frame PDB files.
Changed in version 0.7.6: The multiframe keyword was added to select the writing mode. The writing of bond information (CONECT records) is now disabled by default but can be enabled with the bonds keyword.
Changed in version 0.11.0: Frames now 0-based instead of 1-based
Changed in version 0.14.0: PDB doesn’t save charge information
Create a new PDBWriter
Parameters: - *filename* – name of output file
- *start* – starting timestep
- *step* – skip between subsequent timesteps
- *remarks* – comments to annotate pdb file (added to the TITLE record); note that
any remarks from the trajectory that serves as input are
written to REMARK records with lines longer than
remark_max_length
(66 characters) being wrapped. - *convert_units* – units are converted to the MDAnalysis base format;
None
selects the value ofMDAnalysis.core.flags
[‘convert_lengths’] - *bonds* –
write bonds to the PDB file as CONECT records [
False
]Note
Currently only works when writing a whole
Universe
and if bond information is available in the topology. (For selections smaller than the wholeUniverse
, the atom numbering in the CONECT records would not match the numbering of the atoms in the new PDB file and therefore aNotImplementedError
is raised.) - *multiframe* –
False
: write a single frame to the file;True
: create a multi frame PDB file in which frames are written as MODEL ... ENDMDL records. IfNone
, then the class default is chosen. [None
]
-
_check_pdb_coordinates
()[source]¶ Check if the coordinate values fall within the range allowed for PDB files.
Deletes the output file if this is the first frame or if frames have already been written (in multi-frame mode) adds a REMARK instead of the coordinates and closes the file.
Raises
ValueError
if the coordinates fail the check.
-
_write_pdb_bonds
()[source]¶ Writes out all the bond records; works only for Universe objects.
Warning
All bonds are written out, using the old atom numbers - this is incorrect. Once a selection is made, the atom numbers have to be updated (currently they are unmodified) and bonds have to be selected for, only if all the atoms for a bond are still present.
Therefore, this method raises a
NotImplementedError
if CONECT records for anything smaller than theUniverse
are written.Changed in version 0.7.6: Only write CONECT records if
PrimitivePDBWriter.bonds
== True
. RaisesNotImplementedError
if it would produce wrong output.
-
_update_frame
(obj)[source]¶ Method to initialize important attributes in writer from a AtomGroup or Universe obj.
Attributes initialized/updated:
PrimitivePDBWriter.obj
(the entity that provides topology information and coordinates, either aAtomGroup
or a wholeUniverse
)PrimitivePDBWriter.trajectory
(the underlying trajectoryReader
)PrimitivePDBWriter.timestep
(the underlying trajectoryTimestep
)
Before calling
write_next_timestep()
this method must be called at least once to enable extracting topology information from the current frame.
-
_write_timestep
(ts, multiframe=False)[source]¶ Write a new timestep ts to file
Does unit conversion if necessary.
By setting multiframe =
True
, MODEL ... ENDMDL records are written to represent trajectory frames in a multi-model PDB file. (At the moment we do not write the NUMMDL record.)The multiframe =
False
keyword signals that thePrimitivePDBWriter
is in single frame mode and no MODEL records are written.Changed in version 0.7.6: The multiframe keyword was added, which completely determines if MODEL records are written. (Previously, this was decided based on the underlying trajectory and only if
len(traj) > 1
would MODEL records have been written.)
-
END
()[source]¶ Write END record.
Only a single END record is written. Calling END multiple times has no effect. Because
close()
also calls this method right before closing the file it is recommended to not callEND()
explicitly.
-
REMARK
(*remarks)[source]¶ Write generic REMARK record (without number).
Each string provided in remarks is written as a separate REMARK record.
See also REMARK (update).
-
remark_max_length
= 66¶ wrap comments into REMARK records that are not longer than
-
write
(obj)[source]¶ Write object obj at current trajectory frame to file.
obj can be a selection (i.e. a
AtomGroup
) or a wholeUniverse
.The last letter of the
segid
is used as the PDB chainID (but seeATOM()
for details).Parameters: *obj* – AtomGroup
orUniverse
-
write_all_timesteps
(obj)[source]¶ Write all timesteps associated with obj to the PDB file.
obj can be a
AtomGroup
or aUniverse
.The method writes the frames from the one specified as start until the end, using a step of skip (start and skip are set in the constructor). Thus, if u is a Universe then
u.trajectory[-2] pdb = PrimitivePDBWriter("out.pdb", u.atoms.n_atoms) pdb.write_all_timesteps(u)
will write a PDB trajectory containing the last 2 frames and
pdb = PrimitivePDBWriter("out.pdb", u.atoms.n_atoms, start=12, skip=2) pdb.write_all_timesteps(u)
will be writing frames 12, 14, 16, ...
Changed in version 0.11.0: Frames now 0-based instead of 1-based
-
write_next_timestep
(ts=None, **kwargs)[source]¶ write a new timestep to the PDB file
Keywords: - ts
base.Timestep
object containing coordinates to be written to trajectory file; ifNone
thenPrimitivePDBWriter.ts`
is tried.- multiframe
False
: write a single frame (default);True
behave as a trajectory writer
Note
Before using this method with another
base.Timestep
in the ts argument,PrimitivePDBWriter._update_frame()
must be called with theUniverse
as its argument so that topology information can be gathered.
-
class
MDAnalysis.coordinates.PDB.
MultiPDBWriter
(filename, bonds='conect', n_atoms=None, start=0, step=1, remarks='Created by PrimitivePDBWriter', convert_units=None, multiframe=None)[source]¶ PDB writer that implements a subset of the PDB 3.2 standard .
PDB format as used by NAMD/CHARMM: 4-letter resnames and segID, altLoc is written.
By default,
MultiPDBWriter
writes a PDB “movie” (multi frame mode, multiframe =True
), consisting of multiple models (using the MODEL and ENDMDL records).See also
This class is identical to
PrimitivePDBWriter
with the one exception that it defaults to writing multi-frame PDB files instead of single frames.New in version 0.7.6.
Create a new PDBWriter
Parameters: - *filename* – name of output file
- *start* – starting timestep
- *step* – skip between subsequent timesteps
- *remarks* – comments to annotate pdb file (added to the TITLE record); note that
any remarks from the trajectory that serves as input are
written to REMARK records with lines longer than
remark_max_length
(66 characters) being wrapped. - *convert_units* – units are converted to the MDAnalysis base format;
None
selects the value ofMDAnalysis.core.flags
[‘convert_lengths’] - *bonds* –
write bonds to the PDB file as CONECT records [
False
]Note
Currently only works when writing a whole
Universe
and if bond information is available in the topology. (For selections smaller than the wholeUniverse
, the atom numbering in the CONECT records would not match the numbering of the atoms in the new PDB file and therefore aNotImplementedError
is raised.) - *multiframe* –
False
: write a single frame to the file;True
: create a multi frame PDB file in which frames are written as MODEL ... ENDMDL records. IfNone
, then the class default is chosen. [None
]
5.11.3. Biopython (strict) PDB Reader and Writer¶
The PDB
module can make use of Biopython’s Bio.PDB
[Hamelryck2003] but replaces the standard PDB file parser with one that uses
the MDAnalysis.coordinates.pdb.extensions.SloppyStructureBuilder
to
cope with very large pdb files as commonly encountered in MD simulations. The
Biopython-based PDBReader
has the advantage that it implements the
PDB standard rigorously but this comes at the cost of flexibility and
performance. It is also difficult to write out selections using this
implementation (PDBWriter
) and multi frame PDB files are not
implemented. The Biopython Reader/Writer can be selected when loading data into
a Universe
by providing the keyword
permissive = False
.
The Biopython PDB parser Bio.PDB.PDBParser
is fairly strict and even
in its own permissive mode (which MDAnalysis employs) typically warns about
missing element names with a
Bio.PDB.PDBExceptions.PDBConstructionWarning
. Such warnings, however,
are generally harmless and therefore are filtered (and ignored) by MDAnalysis
with the help of warnings.filterwarnings()
.
5.11.3.1. Classes¶
-
class
MDAnalysis.coordinates.PDB.
PDBReader
(filename, convert_units=None, **kwargs)[source]¶ Read a pdb file into a
BioPython.PDB
structure.The coordinates are also supplied as one numpy array and wrapped into a Timestep object.
Note
The Biopython.PDB reader does not parse the
CRYST1
record and hence the unitcell dimensions are not set. Use thePrimitivePDBReader
instead (i.e. use theprimitive=True
keyword forUniverse
).Changed in version 0.11.0: * Frames now 0-based instead of 1-based. * All PDB header metadata parsed by the reader is available in the dict
metadata
.
-
class
MDAnalysis.coordinates.PDB.
PDBWriter
(pdbfilename, universe=None, multi=False, **kwargs)[source]¶ Write out the current time step as a pdb file.
This is not cleanly implemented at the moment. One must supply a universe, even though this is nominally an optional argument. The class behaves slightly differently depending on if the structure was loaded from a PDB (then the full-fledged
Bio.PDB
writer is used) or if this is really only an atom selection (then a less sophistiocated writer is employed).Note
The standard PDBWriter can only write the whole system. In order to write a selection, use the
PrimitivePDBWriter
, which happens automatically when thewrite()
method of aAtomGroup
instance is used.pdbwriter = PDBWriter(<pdbfilename>,universe=universe,**kwargs)
Parameters: - filename; if multi=True, embed a %%d formatstring (pdbfilename) – so that write_next_timestep() can insert the frame number
- supply a universe [really REQUIRED; optional only for compatibility] (universe) –
- False (multi) – write a single structure to a single pdb True: write all frames to multiple pdb files
References
[Hamelryck2003] | Hamelryck, T., Manderick, B. (2003) PDB parser and structure class implemented in Python. Bioinformatics, 19, 2308-2310. http://biopython.org |