pymacs:
IntroductionDownload and Installation
Documentation
Scripts
Introduction:
pymacs is a python module for dealing with structure files and trajectory data from the GROMACS molecular dynamics package. It has interfaces to some gromacs functions and uses gromacs routines for command line parsing, reading and writing of structure files (pdb,gro,...) and for reading trajectory data (only xtc at the moment). It is quite useful to write python scripts for simulation setup and analysis that can be combined with other powerful python packages like numpy, scipy or plotting libraries like pylab. It has an intuitive data structure (Model->Chain->Molecule->Atom) and allows modifications at all levels likeDownload and Installation:
MAC USERS: During installation, the compiler is somewhat unhappy about the gromacs libraries but it works anyway.
Documentation:
Online documentation generated with epydoc is available here.Basic Usage:
Every script that uses pymacs starts with importing pymacs:from pymacs import *Below you find examples how to generate scripts and analysis tools with pymacs.
Command line parsing
Reading/writing structure files
Data structures / Selections
Editing structure files
Reading xtc files
Building structures from sequence
Index Files
Command line parsing
pymacs provides an interface to the fancy gromacs command line parser "parse_common_args".It allows easy command line parsing, default setting and error handling as well as a manual/help output when starting the program with the option -h.
To use the command line parser we have to build three lists or tuples.
1) The (optional) program description, which is simply a tuple of text lines with a description of your program/script.
text = ("Here you can add some text",
"that explains what your",
"program does.")
2) A list of files you want to read or write. For gromacs C-programmers:
This corresponds the t_filenm structure in gromacs.
files= [
(efTPS,"-s", "protein.pdb", ffREAD ),
(efXTC, "-f", "traj.xtc", ffREAD ),
(efSTO, "-o", "out.pdb", ffWRITE ),
(efDAT, "-data", "data.dat", ffOPTWR )
]
Each tuple starts with the filetype definition (e.g. efTPS for tpr, pdb,
gro,....) followed by the argument (e.g. -s). The next entry contains a
default value and the last entry tells what we want to do with that file
(read, write,...).3) Finally we contruct a list of tuples with command line options (t_pargs in gromacs).
options=[
('-n',FALSE,etINT,'a number',42),
('-b',FALSE,etBOOL,'a bool',True),
('-r',FALSE,etREAL,'a float',3.1415),
('-ll',FALSE,etSTR,'a string','some string'),
('-tt',FALSE,etTIME,'time....', 2e-4),
('-vec',FALSE,etRVEC,'a vector',[1.,2.,3.])
]
Each tuple starts with the option (e.g. -n), a flag of which a forgot
what it does, but if you set it to FALSE it's fine, followed by a description of the
data type you want read (e.g. etINT for an integer value or etRVEC for a
vector), a text line that serves as a description of this option and
finally you can provide a default value for this option.Now the command line (in general sys.argv), the file list, the options and the text are passed to the "parse_common_args" function.
args=parse_common_args(sys.argv,files,options,desc=text)That's it for command line parsing. If you have the code above in a script called script1.py you can run it from the command line:
$ python script1.py -hThis will produce the following output:
>>> PYTHON featuring: <<<
:-) G R O M A C S (-:
Gnomes, ROck Monsters And Chili Sauce
:-) VERSION 3.3.3 (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-) script1.py (-:
DESCRIPTION
-----------
Here you can add some text that explains what your program does.
Option Filename Type Description
------------------------------------------------------------
-s protein.pdb Input Structure+mass(db): tpr tpb tpa gro g96 pdb
xml
-f traj.xtc Input Compressed trajectory (portable xdr format)
-o out.pdb Output Generic structure: gro g96 pdb xml
-data data.dat Output, Opt. Generic data file
Option Type Value Description
------------------------------------------------------
-[no]h bool yes Print help info and quit
-nice int 0 Set the nicelevel
-n int 42 a number
-[no]b bool yes a bool
-r real 3.1415 a float
-ll string some string a string
-tt time 0.0002 time....
-vec vector 1 2 3 a vector
'args' is a dictionary which is returned from "parse_common_args". It contains all information from the command line.
1) To get the filenames passed with a certain option (or the default name) simply use:
filename = args['-s']['fns']This returns "protein.pdb".
To access the values passed with the options (here -n) use:
n = args['-n']['value']
Reading/writing structure files
Once you got the command line, you can read structure files. At the moment gro, pdb, g96 are supported. You can also pass tpr or xml files but these are not fully converted into python data (only atom names, coordinates and stiff like that but not simulation parameters or force fields).pymacs uses a bunch of classes to store structure data in an easy-to-handle way. The picture below gives a description of the data organization.
To read a structure file and convert it into these classes use:
model = Model().read(args['-s'])
The model is an instance of the class "Model". It contains a list of
atoms, a list of residues, a list of chains and a chain dictionary.
To write the structure to a file simply use the "write" function.
model.write(args['-o'])
This writes a new coordinate file (either pdb, gro, g96) with the
filename passed with the option -o on the command line.Data structures / Selections
The data organization in a Model->Chain->Molecule->Atom hierarchy allows straightforward selection of subsets of structures and looping over chains, molecules or atoms. Here are some examples:
# looping over atoms
for atom in model.atoms:
print atom.name, atom.resnr, atom.bfac
# looping over residues and over atoms in each residue
for res in model.residues:
print res.id, res.resname
for atom in res.atoms:
print atom.x[0], atom.x[1], atom.x[2] # coordinates
# looping over chains
for chain in model.chains:
print chain.id, len(chain.atoms), len(chain.residues)
# chain identifier, number of atoms, number of residues
You can also select a subset of atoms using the "fetch_atoms" function:
backbone_atoms = model.fetch_atoms(['N','CA','C'])
for atom in backbone_atoms:
print atom # this prints the atom in pdb format
The "fetch_atoms" functions is inherited to the Chain and Molecule
classes. Hence, they work there as well.
res = model.chdic['A'].residues[-1] # the last residue from chain A
heavy_atoms = res.fetch_atoms('H',how='byelem',inv=True)
for atom in heavy_atoms:
print atom
"fetch_atoms" can also search by element. The "inv"-flag inverts the
search. Hence, all non-hydrogen atoms are returned.The "fetch_residues" function works similar.
cysl = model.fetch_residues('CYS') # returns all cysteine residues
chB = model.chdic['B'] # select chain B
ala_tyr_his = chB.fetch_residues(['ALA','TYR','HIS']) # returns all
# alanine, tyrosine and histidine residues from chain B
Editing structure files
pymacs offers several methods to edit structure files. On the "Atom"-level you can simply change all attributes.atom = model.atoms[0] # first atom from structure atom.name = 'DUM' # change atom name to DUM atom.x = [1,2,3] # assign new coordinates atom.bfac = 0. # set b-factor to 0 . . .On the residue level you can also delete and insert atoms:
res = model.residues[-1] # last residue
res.set_resname('C'+res.resname)
# change residue name e.g. from ALA to CALA.
# All atoms from this residue are renamed as well
del res['CA'] # delete CA atom
a = Atom(name = 'DUM',x = [1,2,3],bfac=25.) # create atom
res.insert_atom(0,a) # insert atom into residue at position 0
a2 = a.copy() # copy dummy atom
res.append(a2) # attach at the end of the residue
Working on the "Chain"-level works similarly.
ch = model.chains[0] # get the first chain
res = ch.residues[0].copy() # get a copy of the first residue
ch.insert_residue(5,res) # insert this residue at position 5
# NOTE: inserting a residue like that does not mean that it has
# been "modelled" into the structure and, hence, having any meaningful
# structure. All coordinates remain untouched!
res2 = res.copy() # copy this residue
ch.append(res2) # attach residue at the end of the chain
# You can also insert several residues
chC = model.chdic['C'] # select chain C
ch.insert_chain(10,chC) # insert chain C into ch at position 10
ch.write(args['-o']) # write new structure file
ch.write('chain.pdb') # the same but to chain.pdb
# also nice
seq = ch.get_sequence() # returns sequence in one-letter code
On the "Model"-level you can also do some nice things:
chA = model.chdic['A'].copy() # copy chain A
chA.set_chain_id('Z') # change chain identifier
model.append(chA) # append this chain
model.insert_chain(1,chA) # insert this chain at position 1
coords = matrix(model.coords()) # return coords as numpy matrix
# nice shortcuts
# all residue names
rnames = map(lambda r: r.resname, model.residues)
# all atom names
anames = map(lambda a: a.name, model.atoms)
# all c-terminal residues
cterms = map(lambda ch: ch.residues[-1], model.chains)
# or even better: rename all c-terminal residues (e.g, ALA -> CALA)
map(lambda ch:ch.residues[-1].set_resname('C'+ch.residues[-1].resname),\
model.chains) ;)
# the center of mass
Mtot = sum(map(lambda a: a.m, model.atoms))
x = sum(map(lambda a: a.x[0]*a.m, model.atoms))
y = sum(map(lambda a: a.x[1]*a.m, model.atoms))
z = sum(map(lambda a: a.x[2]*a.m, model.atoms))
com = array([x,y,z])/Mtot
# and so on.....
Reading XTC files
Trajectory analysis is also possible in python. At the moment pymacs can only read xtc files. It works like that:
fp = openXTC(args['-f']) # open xtc file
while 1: # loop over trajectory
frame = readXTCFrame(fp)
if not frame:
break # end of trajectory or error
model.update(frame) # update the model data with coords, box
print "time = %g step = %g coords = %g" % \
(frame['time'],frame['step],model.atoms[0].x[0])
# write time, step and x-coord of first atom
Building structures from sequence
pymacs has an additional package named builder that can build 3D-structures from sequence (this is not modelling or folding).
from pymacs.builder import *
chain = build_chain('ASDERFGTRE') # this builds an extended chain
chain.write('mysequence.pdb') # save as pdb
# additional features:
chain = build_chain('ASDERFGTRE', hydrogens = False) # heavy atoms only
chain = build_chain('ASDERFGTRE',ss='HHHHHHHHHH') # build helix
# or you provide explicit backbone dihedral angles
anti_beta = (-119., -113., 180.) # phi,psi,omega
seq = 'AAAAAAAAA' # sequence
for i in range(len(seq)):
dihedrals.append(anti_beta) # list of phi/psi/omega tuples
chain = build_chain(seq,dihedrals=dihedrals)
Index Files
pymacs also contains classes to read and write gromacs index files.There are basicall two classes, IndexGroup and IndexFile. Two use them in your script import the pymacs.ndx packages.
from pymacs.ndx import *
# Create an index group/file from a pymacs model
model = Model().read('myfile.pdb')
backbone = model.fetch_atoms('['N','CA','C'])
# now store all indices of ths group in an IndexGroup
idx = map(lambda a: a.id, backbone)
# this is the same as
idx = []
for atom in backbone:
idx.append(atom.id)
# now create an IndexGroup
grp = IndexGroup(ids = idx, name = 'backbone_atoms')
# write an index file which only contains this group
IndexFile(groups = [grp]).write('index.ndx')
# reading index files
ndxfile = IndexFile(fname = 'index.ndx')
# get a group
bb = ndxfile.dic['backbone_atoms'] # returns the group backbone_atoms
print bb.ids # print atom indices of this group
Scripts:
Command line parsing: cmdline.pyExamples script for command line parsing
Trajectory reading: read_xtc.py
Examples script for reading xtc files
Structure file editing: make_amber.py
Does some renaming of residues and atom names for using the ffamber port in gromacs
Structure file editing: amber2std.py
Converts residue and atom names from "amberized" pdb file back to standard.
Structure file editing: place_dummy.py
Place dummy atoms in pdb file.