docs/reference-manual/topologies/pdb2gmx-input-files.rst

   1 .. _pdb2gmxfiles:
   2
   3 :ref:`pdb2gmx <gmx pdb2gmx>` input files
   4 ----------------------------------------
   5
   6 The |Gromacs| program :ref:`pdb2gmx <gmx pdb2gmx>` generates a topology for the input
   7 coordinate file. Several formats are supported for that coordinate file,
   8 but :ref:`pdb` is the most commonly-used format (hence the name :ref:`pdb2gmx <gmx pdb2gmx>`).
   9 :ref:`pdb2gmx <gmx pdb2gmx>` searches for force fields in sub-directories of the |Gromacs|
  10 ``share/top`` directory and your working directory. Force fields are
  11 recognized from the file ``forcefield.itp`` in a directory with the
  12 extension ``.ff``. The file ``forcefield.doc`` may be present, and if so, its
  13 first line will be used by :ref:`pdb2gmx <gmx pdb2gmx>` to present a short description to the
  14 user to help in choosing a force field. Otherwise, the user can choose a
  15 force field with the ``-ff xxx`` command-line argument to :ref:`pdb2gmx <gmx pdb2gmx>`, which
  16 indicates that a force field in a ``xxx.ff`` directory is desired. :ref:`pdb2gmx <gmx pdb2gmx>`
  17 will search first in the working directory, then in the |Gromacs|
  18 ``share/top`` directory, and use the first matching ``xxx.ff`` directory found.
  19
  20 Two general files are read by :ref:`pdb2gmx <gmx pdb2gmx>`: an atom type file (extension
  21 :ref:`atp`, see :ref:`atomtype`) from the force-field directory, and a file
  22 called ``residuetypes.dat`` from either the working directory, or the
  23 |Gromacs| ``share/top`` directory. ``residuetypes.dat`` determines which residue
  24 names are considered protein, DNA, RNA, water, and ions.
  25
  26 :ref:`pdb2gmx <gmx pdb2gmx>` can read one or multiple databases with topological information
  27 for different types of molecules. A set of files belonging to one
  28 database should have the same basename, preferably telling something
  29 about the type of molecules (*e.g.* aminoacids, rna, dna). The possible
  30 files are:
  31
  32 -  ``<basename>.rtp``
  33
  34 -  ``<basename>.r2b (optional)``
  35
  36 -  ``<basename>.arn (optional)``
  37
  38 -  ``<basename>.hdb (optional)``
  39
  40 -  ``<basename>.n.tdb (optional)``
  41
  42 -  ``<basename>.c.tdb (optional)``
  43
  44 Only the :ref:`rtp` file, which contains the topologies of the building
  45 blocks, is mandatory. Information from other files will only be used for
  46 building blocks that come from an :ref:`rtp` file with the same base name. The
  47 user can add building blocks to a force field by having additional files
  48 with the same base name in their working directory. By default, only
  49 extra building blocks can be defined, but calling :ref:`pdb2gmx <gmx pdb2gmx>` with the ``-rtpo``
  50 option will allow building blocks in a local file to replace the default
  51 ones in the force field.
  52
  53 Residue database
  54 ~~~~~~~~~~~~~~~~
  55
  56 The files holding the residue databases have the extension :ref:`rtp`.
  57 Originally this file contained building blocks (amino acids) for
  58 proteins, and is the |Gromacs| interpretation of the ``rt37c4.dat`` file of
  59 GROMOS. So the residue database file contains information (bonds,
  60 charges, charge groups, and improper dihedrals) for a frequently-used
  61 building block. It is better *not* to change this file because it is
  62 standard input for :ref:`pdb2gmx <gmx pdb2gmx>`, but if changes are needed make them in the
  63 :ref:`top` file (see :ref:`topfile`), or in a :ref:`rtp` file in the working
  64 directory as explained in sec. :ref:`pdb2gmxfiles`. Defining topologies
  65 of new small molecules is probably easier by writing an include topology
  66 file :ref:`itp` directly. This will be discussed in section :ref:`molitp`.
  67 When adding a new protein residue to the database, don’t forget to add
  68 the residue name to the residuetypes.dat file, so that :ref:`grompp <gmx grompp>`, :ref:`make_ndx <gmx make_ndx>`
  69 and analysis tools can recognize the residue as a protein residue (see
  70 :ref:`defaultgroups`).
  71
  72 The :ref:`rtp` files are only used by :ref:`pdb2gmx <gmx pdb2gmx>`. As mentioned before, the only
  73 extra information this program needs from the :ref:`rtp` database is bonds,
  74 charges of atoms, charge groups, and improper dihedrals, because the
  75 rest is read from the coordinate input file. Some proteins contain
  76 residues that are not standard, but are listed in the coordinate file.
  77 You have to construct a building block for this “strange” residue,
  78 otherwise you will not obtain a :ref:`top` file. This also holds for molecules
  79 in the coordinate file such as ligands, polyatomic ions, crystallization
  80 co-solvents, etc. The residue database is constructed in the following
  81 way:
  82
  83 ::
  84
  85     [ bondedtypes ]  ; mandatory
  86     ; bonds  angles  dihedrals  impropers
  87          1       1          1          2  ; mandatory
  88
  89     [ GLY ]  ; mandatory
  90
  91      [ atoms ]  ; mandatory
  92     ; name  type  charge  chargegroup
  93          N     N  -0.280     0
  94          H     H   0.280     0
  95         CA   CH2   0.000     1
  96          C     C   0.380     2
  97          O     O  -0.380     2
  98
  99      [ bonds ]  ; optional
 100     ;atom1 atom2      b0      kb
 101          N     H
 102          N    CA
 103         CA     C
 104          C     O
 105         -C     N
 106
 107      [ exclusions ]  ; optional
 108     ;atom1 atom2
 109
 110      [ angles ]  ; optional
 111     ;atom1 atom2 atom3    th0    cth
 112
 113      [ dihedrals ]  ; optional
 114     ;atom1 atom2 atom3 atom4   phi0     cp   mult
 115
 116      [ impropers ]  ; optional
 117     ;atom1 atom2 atom3 atom4     q0     cq
 118          N    -C    CA     H
 119         -C   -CA     N    -O
 120
 121     [ ZN ]
 122
 123      [ atoms ]
 124         ZN    ZN   2.000     0
 125
 126 The file is free format; the only restriction is that there can be at
 127 most one entry on a line. The first field in the file is the ``[ bondedtypes ]`` field,
 128 which is followed by four numbers, indicating the interaction type for
 129 bonds, angles, dihedrals, and improper dihedrals. The file contains
 130 residue entries, which consist of atoms and (optionally) bonds, angles,
 131 dihedrals, and impropers. The charge group codes denote the charge group
 132 numbers. Atoms in the same charge group should always be ordered
 133 consecutively. When using the hydrogen database with :ref:`pdb2gmx <gmx pdb2gmx>` for adding
 134 missing hydrogens (see :ref:`hdb`), the atom names defined in the :ref:`rtp`
 135 entry should correspond exactly to the naming convention used in the
 136 hydrogen database. The atom names in the bonded interaction can be
 137 preceded by a minus or a plus, indicating that the atom is in the
 138 preceding or following residue respectively. Explicit parameters added
 139 to bonds, angles, dihedrals, and impropers override the standard
 140 parameters in the :ref:`itp` files. This should only be used in special cases.
 141 Instead of parameters, a string can be added for each bonded
 142 interaction. This is used in GROMOS-96 :ref:`rtp` files. These strings are
 143 copied to the topology file and can be replaced by force-field
 144 parameters by the C-preprocessor in :ref:`grompp <gmx grompp>` using ``#define`` statements.
 145
 146 :ref:`pdb2gmx <gmx pdb2gmx>` automatically generates all angles. This means
 147 that for most force fields the ``[ angles ]`` field is only
 148 useful for overriding :ref:`itp` parameters. For the GROMOS-96
 149 force field the interaction number of all angles needs to be specified.
 150
 151 :ref:`pdb2gmx <gmx pdb2gmx>` automatically generates one proper dihedral for every rotatable
 152 bond, preferably on heavy atoms. When the ``[ dihedrals ]`` field is used, no other
 153 dihedrals will be generated for the bonds corresponding to the specified
 154 dihedrals. It is possible to put more than one dihedral function on a
 155 rotatable bond. In the case of CHARMM27 FF :ref:`pdb2gmx <gmx pdb2gmx>` can add correction
 156 maps to the dihedrals using the default ``-cmap`` option. Please refer to
 157 :ref:`charmmff` for more information.
 158
 159 :ref:`pdb2gmx <gmx pdb2gmx>` sets the number of exclusions to 3, which means
 160 that interactions between atoms connected by at most 3 bonds are
 161 excluded. Pair interactions are generated for all pairs of atoms that
 162 are separated by 3 bonds (except pairs of hydrogens). When more
 163 interactions need to be excluded, or some pair interactions should not
 164 be generated, an ``[ exclusions ]`` field can be added,
 165 followed by pairs of atom names on separate lines. All non-bonded and
 166 pair interactions between these atoms will be excluded.
 167
 168 Residue to building block database
 169 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 170
 171 Each force field has its own naming convention for residues. Most
 172 residues have consistent naming, but some, especially those with
 173 different protonation states, can have many different names. The
 174 :ref:`r2b` files are used to convert standard residue names to
 175 the force-field build block names. If no :ref:`r2b` is present
 176 in the force-field directory or a residue is not listed, the building
 177 block name is assumed to be identical to the residue name. The
 178 :ref:`r2b` can contain 2 or 5 columns. The 2-column format has
 179 the residue name in the first column and the building block name in the
 180 second. The 5-column format has 3 additional columns with the building
 181 block for the residue occurring in the N-terminus, C-terminus and both
 182 termini at the same time (single residue molecule). This is useful for,
 183 for instance, the AMBER force fields. If one or more of the terminal
 184 versions are not present, a dash should be entered in the corresponding
 185 column.
 186
 187 There is a |Gromacs| naming convention for residues which is only apparent
 188 (except for the :ref:`pdb2gmx <gmx pdb2gmx>` code) through the
 189 :ref:`r2b` file and ``specbond.dat`` files. This
 190 convention is only of importance when you are adding residue types to an
 191 :ref:`rtp` file. The convention is listed in :numref:`Table %s <tab-r2b>`.
 192 For special bonds with, for instance,
 193 a heme group, the |Gromacs| naming convention is introduced through
 194 ``specbond.dat`` (see :ref:`specbond`),
 195 which can subsequently be translated by the :ref:`r2b` file,
 196 if required.
 197
 198 .. |NDEL| replace:: N\ :math:`_\delta`
 199 .. |NEPS| replace:: N\ :math:`_\epsilon`
 200
 201 .. _tab-r2b:
 202
 203 .. table:: Internal |Gromacs| residue naming convention.
 204
 205            +--------------+-----------------------------------------------------------+
 206            | |Gromacs| ID | Residue                                                   |
 207            +==============+===========================================================+
 208            | ARG          | protonated arginine                                       |
 209            +--------------+-----------------------------------------------------------+
 210            | ARGN         | neutral arginine                                          |
 211            +--------------+-----------------------------------------------------------+
 212            | ASP          | negatively charged aspartic acid                          |
 213            +--------------+-----------------------------------------------------------+
 214            | ASPH         | neutral aspartic acid                                     |
 215            +--------------+-----------------------------------------------------------+
 216            | CYS          | neutral cysteine                                          |
 217            +--------------+-----------------------------------------------------------+
 218            | CYS2         | cysteine with sulfur bound to another cysteine or a heme  |
 219            +--------------+-----------------------------------------------------------+
 220            | GLU          |  negatively charged glutamic acid                         |
 221            +--------------+-----------------------------------------------------------+
 222            | GLUH         |  neutral glutamic acid                                    |
 223            +--------------+------------------------------+----------------------------+
 224            | HISD         | neutral histidine with |NDEL| protonated                  |
 225            +--------------+-----------------------------------------------------------+
 226            | HISE         | neutral histidine with |NEPS| protonated                  |
 227            +--------------+------------------------------+----------------------------+
 228            | HISH         | positive histidine with both |NDEL| and |NEPS| protonated |
 229            +--------------+-----------------------------------------------------------+
 230            | HIS1         | histidine bound to a heme                                 |
 231            +--------------+-----------------------------------------------------------+
 232            | LYSN         | neutral lysine                                            |
 233            +--------------+-----------------------------------------------------------+
 234            | LYS          | protonated lysine                                         |
 235            +--------------+-----------------------------------------------------------+
 236            | HEME         | heme                                                      |
 237            +--------------+-----------------------------------------------------------+
 238
 239
 240 Atom renaming database
 241 ~~~~~~~~~~~~~~~~~~~~~~
 242
 243 Force fields often use atom names that do not follow IUPAC or PDB
 244 convention. The :ref:`arn` database is used to translate the
 245 atom names in the coordinate file to the force-field names. Atoms that
 246 are not listed keep their names. The file has three columns: the
 247 building block name, the old atom name, and the new atom name,
 248 respectively. The residue name supports question-mark wildcards that
 249 match a single character.
 250
 251 An additional general atom renaming file called
 252 ``xlateat.dat`` is present in the ``share/top``
 253 directory, which translates common non-standard atom names in the
 254 coordinate file to IUPAC/PDB convention. Thus, when writing force-field
 255 files, you can assume standard atom names and no further atom name
 256 translation is required, except for translating from standard atom names
 257 to the force-field ones.
 258
 259 Hydrogen database
 260 ~~~~~~~~~~~~~~~~~
 261
 262 The hydrogen database is stored in :ref:`hdb` files. It contains information
 263 for the :ref:`pdb2gmx <gmx pdb2gmx>` program on how to connect hydrogen atoms to existing
 264 atoms. In versions of the database before |Gromacs| 3.3, hydrogen atoms
 265 were named after the atom they are connected to: the first letter of the
 266 atom name was replaced by an ‘H.’ In the versions from 3.3 onwards, the
 267 H atom has to be listed explicitly, because the old behavior was
 268 protein-specific and hence could not be generalized to other molecules.
 269 If more than one hydrogen atom is connected to the same atom, a number
 270 will be added to the end of the hydrogen atom name. For example, adding
 271 two hydrogen atoms to ``ND2`` (in asparagine), the hydrogen atoms will
 272 be named ``HD21`` and ``HD22``. This is important since atom naming in
 273 the :ref:`rtp` file (see :ref:`rtp`) must be the same. The format of the
 274 hydrogen database is as follows:
 275
 276 ::
 277
 278     ; res   # additions
 279             # H add type    H       i       j       k
 280     ALA     1
 281             1       1       H       N       -C      CA
 282     ARG     4
 283             1       2       H       N       CA      C
 284             1       1       HE      NE      CD      CZ
 285             2       3       HH1     NH1     CZ      NE
 286             2       3       HH2     NH2     CZ      NE
 287
 288 On the first line we see the residue name (ALA or ARG) and the number of
 289 kinds of hydrogen atoms that may be added to this residue by the
 290 hydrogen database. After that follows one line for each addition, on
 291 which we see:
 292
 293 -  The number of H atoms added
 294
 295 -  The method for adding H atoms, which can be any of:
 296
 297    #. | *one planar hydrogen*, *e.g.* *rings or peptide bond*
 298       | One hydrogen atom (n) is generated, lying in the plane of atoms
 299         (i,j,k) on the plane bisecting angle (j-i-k) at a distance of
 300         0.1 nm from atom i, such that the angles (n-i-j) and (n-i-k) are
 301         :math:`>` 90\ :math:`^{\rm o}`.
 302
 303    #. | *one single hydrogen*, *e.g.* *hydroxyl*
 304       | One hydrogen atom (n) is generated at a distance of 0.1 nm from
 305         atom i, such that angle (n-i-j)=109.5 degrees and dihedral
 306         (n-i-j-k)=trans.
 307
 308    #. | *two planar hydrogens*, *e.g.* *ethylene -C=CH*:math:`_2`, *or amide
 309         -C(=O)NH*:math:`_2`
 310       | Two hydrogens (n1,n2) are generated at a distance of 0.1 nm from
 311         atom i, such that angle (n1-i-j)=(n2-i-j)=120 degrees and
 312         dihedral (n1-i-j-k)=cis and (n2-i-j-k)=trans, such that names
 313         are according to IUPAC standards \ :ref:`129 <refiupac70>`.
 314
 315    #. | *two or three tetrahedral hydrogens*, *e.g.* *-CH*:math:`_3`
 316       | Three (n1,n2,n3) or two (n1,n2) hydrogens are generated at a
 317         distance of 0.1 nm from atom i, such that angle
 318         (n1-i-j)=(n2-i-j)=(n3-i-j)=109.47:math:`^{\rm o}`, dihedral
 319         (n1-i-j-k)=trans, (n2-i-j-k)=trans+120 and
 320         (n3-i-j-k)=trans+240:math:`^{\rm o}`.
 321
 322    #. | *one tetrahedral hydrogen*, *e.g.* *C*\ :math:`_3`\* CH*
 323       | One hydrogen atom (n:math:`^\prime`) is generated at a distance
 324         of 0.1 nm from atom i in tetrahedral conformation such that
 325         angle
 326         (n:math:`^\prime`-i-j)=(n:math:`^\prime`-i-k)=(n:math:`^\prime`-i-l)=109.47:math:`^{\rm o}`.
 327
 328    #. | *two tetrahedral hydrogens*, *e.g.* *C-CH*\ :math:`_2`\*-C*
 329       | Two hydrogen atoms (n1,n2) are generated at a distance of 0.1 nm
 330         from atom i in tetrahedral conformation on the plane bisecting
 331         angle j-i-k with angle
 332         (n1-i-n2)=(n1-i-j)=(n1-i-k)=109.47:math:`^{\rm o}`.
 333
 334    #. | *two water hydrogens*
 335       | Two hydrogens are generated around atom i according to
 336         SPC \ :ref:`80 <refBerendsen81>` water geometry. The symmetry
 337         axis will alternate between three coordinate axes in both
 338         directions.
 339
 340    #. | *three water “hydrogens”*
 341       | Two hydrogens are generated around atom i according to
 342         SPC \ :ref:`80 <refBerendsen81>` water geometry. The symmetry
 343         axis will alternate between three coordinate axes in both
 344         directions. In addition, an extra particle is generated on the
 345         position of the oxygen with the first letter of the name
 346         replaced by ‘M’. This is for use with four-atom water models
 347         such as TIP4P \ :ref:`128 <refJorgensen83>`.
 348
 349    #. | *four water “hydrogens”*
 350       | Same as above, except that two additional particles are
 351         generated on the position of the oxygen, with names ‘LP1’ and
 352         ‘LP2.’ This is for use with five-atom water models such as
 353         TIP5P \ :ref:`130 <refMahoney2000a>`.
 354
 355 -  The name of the new H atom (or its prefix, *e.g.* ``HD2``
 356    for the asparagine example given earlier).
 357
 358 -  Three or four control atoms (i,j,k,l), where the first always is the
 359    atom to which the H atoms are connected. The other two or three
 360    depend on the code selected. For water, there is only one control
 361    atom.
 362
 363 Some more exotic cases can be approximately constructed from the above
 364 tools, and with suitable use of energy minimization are good enough for
 365 beginning MD simulations. For example secondary amine hydrogen, nitrenyl
 366 hydrogen (:math:`\mathrm{C}=\mathrm{NH}`)
 367 and even ethynyl hydrogen could be approximately constructed using
 368 method 2 above for hydroxyl hydrogen.
 369
 370 Termini database
 371 ~~~~~~~~~~~~~~~~
 372
 373 The termini
 374 databases
 375 are stored in ``aminoacids.n.tdb`` and
 376 ``aminoacids.c.tdb`` for the N- and C-termini respectively.
 377 They contain information for the :ref:`pdb2gmx <gmx pdb2gmx>` program on how
 378 to connect new atoms to existing ones, which atoms should be removed or
 379 changed, and which bonded interactions should be added. Their format is
 380 as follows (from ``gromos43a1.ff/aminoacids.c.tdb``):
 381
 382 ::
 383
 384     [ None ]
 385
 386     [ COO- ]
 387     [ replace ]
 388     C   C       C       12.011  0.27
 389     O   O1      OM      15.9994 -0.635
 390     OXT O2      OM      15.9994 -0.635
 391     [ add ]
 392     2   8       O       C       CA      N
 393         OM      15.9994 -0.635
 394     [ bonds ]
 395     C   O1      gb_5
 396     C   O2      gb_5
 397     [ angles ]
 398     O1  C       O2      ga_37
 399     CA  C       O1      ga_21
 400     CA  C       O2      ga_21
 401     [ dihedrals ]
 402     N   CA      C       O2      gd_20
 403     [ impropers ]
 404     C   CA      O2      O1      gi_1
 405
 406 The file is organized in blocks, each with a header specifying the name
 407 of the block. These blocks correspond to different types of termini that
 408 can be added to a molecule. In this example ``[ COO- ]`` is
 409 the first block, corresponding to changing the terminal carbon atom into
 410 a deprotonated carboxyl group. ``[ None ]`` is the second
 411 terminus type, corresponding to a terminus that leaves the molecule as
 412 it is. Block names cannot be any of the following:
 413 ``replace``, ``add``, ``delete``,
 414 ``bonds``, ``angles``,
 415 ``dihedrals``, ``impropers``. Doing so would
 416 interfere with the parameters of the block, and would probably also be
 417 very confusing to human readers.
 418
 419 For each block the following options are present:
 420
 421 -  | ``[ replace ]``
 422    | Replace an existing atom by one with a different atom type, atom
 423      name, charge, and/or mass. This entry can be used to replace an
 424      atom that is present both in the input coordinates and in the
 425      :ref:`rtp` database, but also to only rename an atom in
 426      the input coordinates such that it matches the name in the force
 427      field. In the latter case, there should also be a corresponding
 428      ``[ add ]`` section present that gives instructions to
 429      add the same atom, such that the position in the sequence and the
 430      bonding is known. Such an atom can be present in the input
 431      coordinates and kept, or not present and constructed by
 432      :ref:`pdb2gmx <gmx pdb2gmx>`. For each atom to be replaced on line
 433      should be entered with the following fields:
 434
 435    -  name of the atom to be replaced
 436
 437    -  new atom name (optional)
 438
 439    -  new atom type
 440
 441    -  new mass
 442
 443    -  new charge
 444
 445 -  | ``[ add ]``
 446    | Add new atoms. For each (group of) added atom(s), a two-line entry
 447      is necessary. The first line contains the same fields as an entry
 448      in the hydrogen database (name of the new atom, number of atoms,
 449      type of addition, control atoms, see :ref:`hdb`), but the
 450      possible types of addition are extended by two more, specifically
 451      for C-terminal additions:
 452
 453    #. | *two carboxyl oxygens, -COO*:math:`^-`
 454       | Two oxygens (n1,n2) are generated according to rule 3, at a
 455         distance of 0.136 nm from atom i and an angle
 456         (n1-i-j)=(n2-i-j)=117 degrees
 457
 458    #. | *carboxyl oxygens and hydrogen, -COOH*
 459       | Two oxygens (n1,n2) are generated according to rule 3, at
 460         distances of 0.123 nm and 0.125 nm from atom i for n1 and n2,
 461         respectively, and angles (n1-i-j)=121 and (n2-i-j)=115 degrees.
 462         One hydrogen (n:math:`^\prime`) is generated around n2 according
 463         to rule 2, where n-i-j and n-i-j-k should be read as
 464         n\ :math:`^\prime`-n2-i and n\ :math:`^\prime`-n2-i-j,
 465         respectively.
 466
 467    After this line, another line follows that specifies the details of
 468    the added atom(s), in the same way as for replacing atoms, *i.e.*:
 469
 470    -  atom type
 471
 472    -  mass
 473
 474    -  charge
 475
 476    -  charge group (optional)
 477
 478    Like in the hydrogen database (see :ref:`rtp`), when more than one
 479    atom is connected to an existing one, a number will be appended to
 480    the end of the atom name. **Note** that, like in the hydrogen
 481    database, the atom name is now on the same line as the control atoms,
 482    whereas it was at the beginning of the second line prior to |Gromacs|
 483    version 3.3. When the charge group field is left out, the added atom
 484    will have the same charge group number as the atom that it is bonded
 485    to.
 486
 487 -  | ``[ delete ]``
 488    | Delete existing atoms. One atom name per line.
 489
 490 -  | ``[ bonds ]``, ``[ angles ]``,
 491      ``[ dihedrals ]`` and ``[ impropers ]``
 492    | Add additional bonded parameters. The format is identical to that
 493      used in the :ref:`rtp` file, see :ref:`rtp`.
 494
 495 Virtual site database
 496 ~~~~~~~~~~~~~~~~~~~~~
 497
 498 Since we cannot rely on the positions of hydrogens in input files, we
 499 need a special input file to decide the geometries and parameters with
 500 which to add virtual site hydrogens. For more complex virtual site
 501 constructs (*e.g.* when entire aromatic side chains are made rigid) we
 502 also need information about the equilibrium bond lengths and angles for
 503 all atoms in the side chain. This information is specified in the
 504 :ref:`vsd` file for each force field. Just as for the termini,
 505 there is one such file for each class of residues in the
 506 :ref:`rtp` file.
 507
 508 The virtual site database is not really a very simple list of
 509 information. The first couple of sections specify which mass centers
 510 (typically called MCH\ :math:`_3`/MNH:math:`_3`) to use for
 511 CH\ :math:`_3`, NH\ :math:`_3`, and NH\ :math:`_2` groups. Depending on
 512 the equilibrium bond lengths and angles between the hydrogens and heavy
 513 atoms we need to apply slightly different constraint distances between
 514 these mass centers. **Note** that we do *not* have to specify the actual
 515 parameters (that is automatic), just the type of mass center to use. To
 516 accomplish this, there are three sections names ``[ CH3 ]``,
 517 ``[ NH3 ]``, and ``[ NH2 ]``. For each of these we expect three columns.
 518 The first column is the atom type bound to the 2/3 hydrogens, the second
 519 column is the next heavy atom type which this is bound, and the third
 520 column the type of mass center to use. As a special case, in the
 521 ``[ NH2 ]`` section it is also possible to specify ``planar`` in the
 522 second column, which will use a different construction without mass
 523 center. There are currently different opinions in some force fields
 524 whether an NH\ :math:`_2` group should be planar or not, but we try hard
 525 to stick to the default equilibrium parameters of the force field.
 526
 527 The second part of the virtual site database contains explicit
 528 equilibrium bond lengths and angles for pairs/triplets of atoms in
 529 aromatic side chains. These entries are currently read by specific
 530 routines in the virtual site generation code, so if you would like to
 531 extend it *e.g.* to nucleic acids you would also need to write new code
 532 there. These sections are named after the short amino acid names
 533 (``[ PHE ]``, ``[ TYR ]``, ``[ TRP ]``, ``[ HID ]``, ``[ HIE ]``,
 534 ``[ HIP ]``), and simply contain 2 or 3 columns with atom names,
 535 followed by a number specifying the bond length (in nm) or angle (in
 536 degrees). **Note** that these are approximations of the equilibrated
 537 geometry for the entire molecule, which might not be identical to the
 538 equilibrium value for a single bond/angle if the molecule is strained.
 539
 540 .. _specbond:
 541
 542 Special bonds
 543 ~~~~~~~~~~~~~
 544
 545 The primary mechanism used by
 546 :ref:`pdb2gmx <gmx pdb2gmx>` to generate
 547 inter-residue bonds relies on head-to-tail linking of backbone atoms in
 548 different residues to build a macromolecule. In some cases (*e.g.*
 549 disulfide bonds, a heme
 550 group, branched
 551 polymers), it is necessary to
 552 create inter-residue bonds that do not lie on the backbone. The file
 553 ``specbond.dat`` takes
 554 care of this function. It is necessary that the residues belong to the
 555 same ``[ moleculetype ]``. The ``-merge`` and
 556 ``-chainsep`` functions of :ref:`pdb2gmx <gmx pdb2gmx>` can be
 557 useful when managing special inter-residue bonds between different
 558 chains.
 559
 560 The first line of ``specbond.dat`` indicates the number of
 561 entries that are in the file. If you add a new entry, be sure to
 562 increment this number. The remaining lines in the file provide the
 563 specifications for creating bonds. The format of the lines is as
 564 follows:
 565
 566 ``resA atomA nbondsA resB atomB nbondsB length newresA
 567 newresB``
 568
 569 The columns indicate:
 570
 571 #. ``resA`` The name of residue A that participates in the
 572    bond.
 573
 574 #. ``atomA`` The name of the atom in residue A that forms
 575    the bond.
 576
 577 #. ``nbondsA`` The total number of bonds
 578    ``atomA`` can form.
 579
 580 #. ``resB`` The name of residue B that participates in the
 581    bond.
 582
 583 #. ``atomB`` The name of the atom in residue B that forms
 584    the bond.
 585
 586 #. ``nbondsB`` The total number of bonds
 587    ``atomB`` can form.
 588
 589 #. ``length`` The reference length for the bond. If
 590    ``atomA`` and ``atomB`` are not within
 591    ``length`` :math:`\pm` 10% in the coordinate file
 592    supplied to :ref:`pdb2gmx <gmx pdb2gmx>`, no bond will be formed.
 593
 594 #. ``newresA`` The new name of residue A, if necessary. Some
 595    force fields use *e.g.* CYS2 for a cysteine in a disulfide or heme
 596    linkage.
 597
 598 #. ``newresB`` The new name of residue B, likewise.