Daniel Seeliger

Pic 1

News:

A new AutoDock/Vina PyMOL plugin is available here.

pymacs-0.4 is available for download here.

tCONCOORD-1.0 is available for download here.

more software

pymacs:

Introduction
Download and Installation
Documentation
Scripts

Introduction:

pymacs is a python module for dealing with structure files and trajectory data from the GROMACS molecular dynamics package. It has interfaces to some gromacs functions and uses gromacs routines for command line parsing, reading and writing of structure files (pdb,gro,...) and for reading trajectory data (only xtc at the moment). It is quite useful to write python scripts for simulation setup and analysis that can be combined with other powerful python packages like numpy, scipy or plotting libraries like pylab. It has an intuitive data structure (Model->Chain->Molecule->Atom) and allows modifications at all levels like

  • Changing of atom, residue and chain properties (name, coordinate, b-factor,...
  • Deleting and inserting atoms, residues, chains
  • Straightforward selection of structure subsets
  • Structure building from sequence
  • Handling gromacs index files



  • Download and Installation:


  • Download pymacs-0.4_linux32bit.tgz or
  • Download pymacs-0.4_linux64bit.tgz or
  • Download pymacs-0.4_mac.tgz (thanks to Michael Lerner and Martin Höfling)
  • $ tar -zxvf pymacs-0.4_?.tgz
  • $ cd pymacs-0.4_?
  • $ python setup.py install

  • MAC USERS: During installation, the compiler is somewhat unhappy about the gromacs libraries but it works anyway.

    Documentation:

    Online documentation generated with epydoc is available here.

    Basic Usage:

    Every script that uses pymacs starts with importing pymacs:
    from pymacs import *
    
    Below you find examples how to generate scripts and analysis tools with pymacs.

    Command line parsing
    Reading/writing structure files
    Data structures / Selections
    Editing structure files
    Reading xtc files
    Building structures from sequence
    Index Files

    Command line parsing

    pymacs provides an interface to the fancy gromacs command line parser "parse_common_args".
    It allows easy command line parsing, default setting and error handling as well as a manual/help output when starting the program with the option -h.
    To use the command line parser we have to build three lists or tuples.
    1) The (optional) program description, which is simply a tuple of text lines with a description of your program/script.
    
    text = ("Here you can add some text",
          "that explains what your",
          "program does.")
    
    2) A list of files you want to read or write. For gromacs C-programmers: This corresponds the t_filenm structure in gromacs.
    files= [
        (efTPS,"-s", "protein.pdb", ffREAD ),
        (efXTC, "-f", "traj.xtc", ffREAD ),
        (efSTO, "-o", "out.pdb", ffWRITE ),
        (efDAT, "-data", "data.dat", ffOPTWR )
        ]
    
    Each tuple starts with the filetype definition (e.g. efTPS for tpr, pdb, gro,....) followed by the argument (e.g. -s). The next entry contains a default value and the last entry tells what we want to do with that file (read, write,...).

    3) Finally we contruct a list of tuples with command line options (t_pargs in gromacs).
    options=[
        ('-n',FALSE,etINT,'a number',42),
        ('-b',FALSE,etBOOL,'a bool',True),
        ('-r',FALSE,etREAL,'a float',3.1415),
        ('-ll',FALSE,etSTR,'a string','some string'),
        ('-tt',FALSE,etTIME,'time....', 2e-4),
        ('-vec',FALSE,etRVEC,'a vector',[1.,2.,3.])
        ]
    
    Each tuple starts with the option (e.g. -n), a flag of which a forgot what it does, but if you set it to FALSE it's fine, followed by a description of the data type you want read (e.g. etINT for an integer value or etRVEC for a vector), a text line that serves as a description of this option and finally you can provide a default value for this option.

    Now the command line (in general sys.argv), the file list, the options and the text are passed to the "parse_common_args" function.
    
    args=parse_common_args(sys.argv,files,options,desc=text)
    
    That's it for command line parsing. If you have the code above in a script called script1.py you can run it from the command line:

    $ python script1.py -h
    
    This will produce the following output:
    
                                >>> PYTHON featuring: <<<
    
                             :-)  G  R  O  M  A  C  S  (-:
    
                         Gnomes, ROck Monsters And Chili Sauce
    
                                :-)  VERSION 3.3.3  (-:
    
    
          Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
           Copyright (c) 1991-2000, University of Groningen, The Netherlands.
                 Copyright (c) 2001-2008, The GROMACS development team,
                check out http://www.gromacs.org for more information.
    
             This program is free software; you can redistribute it and/or
              modify it under the terms of the GNU General Public License
             as published by the Free Software Foundation; either version 2
                 of the License, or (at your option) any later version.
    
                                  :-)  script1.py  (-:
    
    DESCRIPTION
    -----------
    Here you can add some text that explains what your program does.
    
    Option     Filename  Type         Description
    ------------------------------------------------------------
      -s    protein.pdb  Input        Structure+mass(db): tpr tpb tpa gro g96 pdb
                                       xml
      -f       traj.xtc  Input        Compressed trajectory (portable xdr format)
      -o        out.pdb  Output       Generic structure: gro g96 pdb xml
    -data      data.dat  Output, Opt. Generic data file
    
    Option       Type   Value   Description
    ------------------------------------------------------
    -[no]h       bool   yes     Print help info and quit
    -nice        int    0       Set the nicelevel
    -n           int    42      a number
    -[no]b       bool   yes     a bool
    -r           real   3.1415  a float
    -ll          string some string  a string
    -tt          time   0.0002  time....
    -vec         vector 1 2 3   a vector
    


    'args' is a dictionary which is returned from "parse_common_args". It contains all information from the command line.
    1) To get the filenames passed with a certain option (or the default name) simply use:
    filename = args['-s']['fns']
    
    This returns "protein.pdb".
    To access the values passed with the options (here -n) use:
    n = args['-n']['value']
    

    Reading/writing structure files

    Once you got the command line, you can read structure files. At the moment gro, pdb, g96 are supported. You can also pass tpr or xml files but these are not fully converted into python data (only atom names, coordinates and stiff like that but not simulation parameters or force fields).
    pymacs uses a bunch of classes to store structure data in an easy-to-handle way. The picture below gives a description of the data organization.



    To read a structure file and convert it into these classes use:
        model = Model().read(args['-s'])
    
    The model is an instance of the class "Model". It contains a list of atoms, a list of residues, a list of chains and a chain dictionary. To write the structure to a file simply use the "write" function.
        model.write(args['-o'])
    
    This writes a new coordinate file (either pdb, gro, g96) with the filename passed with the option -o on the command line.

    Data structures / Selections

    The data organization in a Model->Chain->Molecule->Atom hierarchy allows straightforward selection of subsets of structures and looping over chains, molecules or atoms. Here are some examples:
    # looping over atoms
    for atom in model.atoms:
        print atom.name, atom.resnr, atom.bfac
    
    # looping over residues and over atoms in each residue
    for res in model.residues:
        print res.id, res.resname
        for atom in res.atoms:
            print atom.x[0], atom.x[1], atom.x[2]   # coordinates
    
    # looping over chains
    for chain in model.chains:
        print chain.id, len(chain.atoms), len(chain.residues)
        # chain identifier, number of atoms, number of residues
    
    
    You can also select a subset of atoms using the "fetch_atoms" function:
    backbone_atoms = model.fetch_atoms(['N','CA','C'])
    for atom in backbone_atoms:
        print atom          # this prints the atom in pdb format
    
    The "fetch_atoms" functions is inherited to the Chain and Molecule classes. Hence, they work there as well.
    res = model.chdic['A'].residues[-1]   # the last residue from chain A
    heavy_atoms = res.fetch_atoms('H',how='byelem',inv=True)
    for atom in heavy_atoms:
        print atom
    
    "fetch_atoms" can also search by element. The "inv"-flag inverts the search. Hence, all non-hydrogen atoms are returned.
    The "fetch_residues" function works similar.
    cysl = model.fetch_residues('CYS')   # returns all cysteine residues
    chB = model.chdic['B']   # select chain B
    ala_tyr_his = chB.fetch_residues(['ALA','TYR','HIS']) # returns all
    # alanine, tyrosine and histidine residues from chain B
    

    Editing structure files

    pymacs offers several methods to edit structure files. On the "Atom"-level you can simply change all attributes.
    atom = model.atoms[0]   # first atom from structure
    atom.name = 'DUM'       # change atom name to DUM
    atom.x = [1,2,3]        # assign new coordinates
    atom.bfac = 0.          # set b-factor to 0
    .
    .
    .
    
    On the residue level you can also delete and insert atoms:
    res = model.residues[-1]    # last residue
    res.set_resname('C'+res.resname)
    # change residue name e.g. from ALA to CALA.
    # All atoms from this residue are renamed as well
    
    del res['CA']    # delete CA atom
    a = Atom(name = 'DUM',x = [1,2,3],bfac=25.)  # create atom
    res.insert_atom(0,a)    # insert atom into residue at position 0
    a2 = a.copy()           # copy dummy atom
    res.append(a2)          # attach at the end of the residue
    
    Working on the "Chain"-level works similarly.
    ch = model.chains[0]        # get the first chain
    res = ch.residues[0].copy() # get a copy of the first residue
    ch.insert_residue(5,res)    # insert this residue at position 5
    # NOTE: inserting a residue like that does not mean that it has
    # been "modelled" into the structure and, hence, having any meaningful
    # structure. All coordinates remain untouched!
    
    res2 = res.copy()    # copy this residue
    ch.append(res2)      # attach residue at the end of the chain
    
    # You can also insert several residues
    chC = model.chdic['C']    # select chain C
    ch.insert_chain(10,chC)   # insert chain C into ch at position 10
    ch.write(args['-o'])      # write new structure file
    ch.write('chain.pdb')     # the same but to chain.pdb
    
    # also nice
    seq = ch.get_sequence()   # returns sequence in one-letter code
    
    
    On the "Model"-level you can also do some nice things:
    chA = model.chdic['A'].copy()     # copy chain A
    chA.set_chain_id('Z')             # change chain identifier
    model.append(chA)                 # append this chain
    model.insert_chain(1,chA)         # insert this chain at position 1
    coords = matrix(model.coords())   # return coords as numpy matrix
    
    # nice shortcuts
    
    # all residue names
    
    rnames = map(lambda r: r.resname, model.residues)
    
    # all atom names
    
    anames = map(lambda a: a.name, model.atoms)
    
    # all c-terminal residues
    
    cterms = map(lambda ch: ch.residues[-1], model.chains)
    
    # or even better: rename all c-terminal residues (e.g, ALA -> CALA)
    map(lambda ch:ch.residues[-1].set_resname('C'+ch.residues[-1].resname),\
    model.chains) ;)
    # the center of mass
    Mtot = sum(map(lambda a: a.m, model.atoms))
    x = sum(map(lambda a: a.x[0]*a.m, model.atoms))
    y = sum(map(lambda a: a.x[1]*a.m, model.atoms))
    z = sum(map(lambda a: a.x[2]*a.m, model.atoms))
    com = array([x,y,z])/Mtot
    
    # and so on.....
    
    

    Reading XTC files

    Trajectory analysis is also possible in python. At the moment pymacs can only read xtc files. It works like that:
    
    fp = openXTC(args['-f'])     # open xtc file
    
    while 1:                     # loop over trajectory
    
        frame = readXTCFrame(fp)
        if not frame:
             break               # end of trajectory or error
        model.update(frame)      # update the model data with coords, box
        print "time = %g step = %g coords = %g" % \
              (frame['time'],frame['step],model.atoms[0].x[0])
        # write time, step and x-coord of first atom
    
    

    Building structures from sequence

    pymacs has an additional package named builder that can build 3D-structures from sequence (this is not modelling or folding).
    from pymacs.builder import *
    
    chain = build_chain('ASDERFGTRE')  # this builds an extended chain
    chain.write('mysequence.pdb')      # save as pdb
    
    # additional features:
    chain = build_chain('ASDERFGTRE', hydrogens = False) # heavy atoms only
    
    chain = build_chain('ASDERFGTRE',ss='HHHHHHHHHH')   # build helix
    
    # or you provide explicit backbone dihedral angles
    
    anti_beta = (-119., -113., 180.)     # phi,psi,omega
    seq  = 'AAAAAAAAA'                   # sequence
    
    for i in range(len(seq)):
        dihedrals.append(anti_beta)      # list of phi/psi/omega tuples
    
    chain = build_chain(seq,dihedrals=dihedrals)
    

    Index Files

    pymacs also contains classes to read and write gromacs index files.There are basicall two classes, IndexGroup and IndexFile. Two use them in your script import the pymacs.ndx packages.
    from pymacs.ndx import *
    
    # Create an index group/file from a pymacs model
    
    model = Model().read('myfile.pdb')
    backbone = model.fetch_atoms('['N','CA','C'])
    # now store all indices of ths group in an IndexGroup
    idx = map(lambda a: a.id, backbone)
    # this is the same as
    idx = []
    for atom in backbone:
        idx.append(atom.id)
    
    # now create an IndexGroup
    grp = IndexGroup(ids = idx, name = 'backbone_atoms')
    
    # write an index file which only contains this group
    IndexFile(groups = [grp]).write('index.ndx')
    
    # reading index files
    ndxfile = IndexFile(fname = 'index.ndx')
    # get a group
    bb = ndxfile.dic['backbone_atoms']  # returns the group backbone_atoms
    print bb.ids  # print atom indices of this group
    
    

    Scripts:

    Command line parsing: cmdline.py
    Examples script for command line parsing
    Trajectory reading: read_xtc.py
    Examples script for reading xtc files
    Structure file editing: make_amber.py
    Does some renaming of residues and atom names for using the ffamber port in gromacs
    Structure file editing: amber2std.py
    Converts residue and atom names from "amberized" pdb file back to standard.
    Structure file editing: place_dummy.py
    Place dummy atoms in pdb file.