docs/gmxapi/userguide/usage.rst

   1 ========================
   2 Using the Python package
   3 ========================
   4
   5 After installing GROMACS, sourcing the "GMXRC" (see GROMACS docs), and installing
   6 the gmxapi Python package (see :doc:`install`), import the package in a Python
   7 script or interactive interpreter. This documentation assumes a convenient alias
   8 of ``gmx`` to refer to the ``gmxapi`` Python package.
   9
  10 ::
  11
  12     import gmxapi as gmx
  13
  14 For full documentation of the Python-level interface and API, use the ``pydoc``
  15 command line tool or the :py:func:`help` interactive Python function, or refer to
  16 the :doc:`pythonreference`.
  17
  18 Any Python *exception* raised by gmxapi
  19 should be descended from (and catchable as) :class:`gmxapi.exceptions.Error`.
  20 Additional status messages can be acquired through the :ref:`gmxapi logging`
  21 facility.
  22 Unfortunately, some errors occurring in the GROMACS library are not yet
  23 recoverable at the Python level, and much of the standard GROMACS terminal
  24 output is not yet accessible through Python.
  25 If you find a particularly problematic scenario, please file a GROMACS bug report.
  26
  27 During installation, the *gmxapi* Python package becomes tied to a specific
  28 GROMACS installation.
  29 If you would like to access multiple GROMACS installations
  30 from Python, build and install *gmxapi* in separate
  31 :ref:`virtual environments <gmxapi venv>`.
  32
  33 .. _parallelism:
  34
  35 Notes on parallelism and MPI
  36 ============================
  37
  38 When launching a *gmxapi* script in an MPI environment,
  39 such as with :command:`mpiexec` or :command:`srun`,
  40 you must help *gmxapi* detect the MPI environment by ensuring that :py:mod:`mpi4py`
  41 is loaded.
  42 Refer to :ref:`mpi_requirements` for more on installing :py:mod:`mpi4py`.
  43
  44 Assuming you use :command:`mpiexec` to launch MPI jobs in your environment,
  45 run a *gmxapi* script on two ranks with something like the following.
  46 Note that it can be helpful to provide :command:`mpiexec` with the full path to
  47 the intended Python interpreter since new process environments are being created.
  48
  49 ::
  50
  51     mpiexec -n 2 `which python` -m mpi4py myscript.py
  52
  53 *gmxapi* 0.1 has limited parallelism, but future versions will include seamless
  54 acceleration as integration improves with the GROMACS library and computing
  55 environment runtime resources.
  56 Currently, *gmxapi* and the GROMACS library do not have an effective way to
  57 share an MPI environment.
  58 Therefore, if you intend to run more than one simulation at a time, in parallel,
  59 in a *gmxapi* script, you should build GROMACS with *thread-MPI* instead of a
  60 standard MPI library.
  61 I.e. configure GROMACS with the CMake flag ``-DGMX_THREAD_MPI=ON``.
  62 Then, launch your *gmxapi* script with one MPI rank per node, and *gmxapi* will
  63 assign each (non-MPI) simulation to its own node, while keeping the full MPI
  64 environment available for use via :py:mod:`mpi4py`.
  65
  66 Running simple simulations
  67 ==========================
  68
  69 Once the ``gmxapi`` package is installed, running simulations is easy with
  70 :py:func:`gmxapi.read_tpr`.
  71
  72 ::
  73
  74     import gmxapi as gmx
  75     simulation_input = gmx.read_tpr(tpr_filename)
  76     md = gmx.mdrun(simulation_input)
  77
  78 Note that this sets up the work you want to perform, but does not immediately
  79 trigger execution. You can explicitly trigger execution with::
  80
  81     md.run()
  82
  83 or you can let gmxapi automatically launch work in response to the data you
  84 request.
  85
  86 The :py:func:`gmxapi.mdrun` operation produces a simulation trajectory output.
  87 You can use ``md.output.trajectory`` as input to other operations,
  88 or you can get the output directly by calling ``md.output.trajectory.result()``.
  89 If the simulation has not been run yet when ``result()`` is called,
  90 the simulation will be run before the function returns.
  91
  92 Running ensemble simulations
  93 ============================
  94
  95 To run a batch of simulations, just pass an array of inputs.::
  96
  97     md = gmx.read_tpr([tpr_filename1, tpr_filename2, ...])
  98     md.run()
  99
 100 Make sure to launch the script in an MPI environment with a sufficient number
 101 of ranks to allow one rank per simulation.
 102
 103 For *gmxapi* 0.1, we recommend configuring the GROMACS build with
 104 ``GMX_THREAD_MPI=ON`` and allowing one rank per node in order to allow each
 105 simulation ensemble member to run on a separate node.
 106
 107 .. seealso:: :ref:`parallelism`
 108
 109 .. _commandline:
 110
 111 Accessing command line tools
 112 ============================
 113
 114 In *gmxapi* 0.1, most GROMACS tools are not yet exposed as *gmxapi* Python operations.
 115 :class:`gmxapi.commandline_operation` provides a way to convert a :command:`gmx`
 116 (or other) command line tool into an operation that can be used in a *gmxapi*
 117 script.
 118
 119 In order to establish data dependencies, input and output files need to be
 120 indicated with the ``input_files`` and ``output_files`` parameters.
 121 ``input_files`` and ``output_files`` key word arguments are dictionaries
 122 consisting of files keyed by command line flags.
 123
 124 For example, you might create a :command:`gmx solvate` operation as::
 125
 126     solvate = gmx.commandline_operation('gmx',
 127                                         arguments=['solvate', '-box', '5', '5', '5'],
 128                                         input_files={'-cs': structurefile},
 129                                         output_files={'-p': topfile,
 130                                                       '-o': structurefile,
 131                                                       }
 132
 133 To check the status or error output of a command line operation, refer to the
 134 ``returncode`` and ``stderr`` outputs.
 135 To access the results from the output file arguments, use the command line flags
 136 as keys in the ``file`` dictionary output.
 137
 138 Example::
 139
 140     structurefile = solvate.output.file['-o'].result()
 141     if solvate.output.returncode.result() != 0:
 142         print(solvate.output.erroroutput.result())
 143
 144 Preparing simulations
 145 =====================
 146
 147 Continuing the previous example, the output of ``solvate`` may be used as the
 148 input for ``grompp``::
 149
 150     grompp = gmx.commandline_operation('gmx', 'grompp',
 151                                        input_files={
 152                                            '-f': mdpfile,
 153                                            '-p': solvate.output.file['-p'],
 154                                            '-c': solvate.output.file['-o'],
 155                                            '-po': mdout_mdp,
 156                                        },
 157                                        output_files={'-o': tprfile})
 158
 159 Then, ``grompp.output.file['-o']`` can be used as the input for :py:func:`gmxapi.read_tpr`.
 160
 161 Simulation input can be modified with the :py:func:`gmxapi.modify_input` operation
 162 before being passed to :py:func:`gmxapi.mdrun`.
 163 For *gmxapi* 0.1, a subset of MDP parameters may be overridden using the
 164 dictionary passed with the ``parameters`` key word argument.
 165
 166 Example::
 167
 168     simulation_input = gmx.read_tpr(grompp.output.file['-o'])
 169     modified_input = gmx.modify_input(input=simulation_input, parameters={'nsteps': 1000})
 170     md = gmx.mdrun(input=modified_input)
 171     md.run()
 172
 173 Using arbitrary Python functions
 174 ================================
 175
 176 Generally, a function in the *gmxapi* package returns an object that references
 177 a node in a work graph,
 178 representing an operation that will be run when the graph executes.
 179 The object has an ``output`` attribute providing access to data Futures that
 180 can be provided as inputs to other operations before computation has actually
 181 been performed.
 182
 183 You can also provide native Python data as input to operations,
 184 or you can operate on native results retrieved from a Future's ``result()``
 185 method.
 186 However, it is trivial to convert most Python functions into *gmxapi* compatible
 187 operations with :py:func:`gmxapi.function_wrapper`.
 188 All function inputs and outputs must have a name and type.
 189 Additionally, functions should be stateless and importable
 190 (e.g. via Python ``from some.module import myfunction``)
 191 for future compatibility.
 192
 193 Simple functions can just use :py:func:`return` to publish their output,
 194 as long as they are defined with a return value type annotation.
 195 Functions with multiple outputs can accept an ``output`` key word argument and
 196 assign values to named attributes on the received argument.
 197
 198 Examples::
 199
 200     from gmxapi import function_wrapper
 201
 202     @function_wrapper(output={'data': float})
 203     def add_float(a: float, b: float) -> float:
 204         return a + b
 205
 206
 207     @function_wrapper(output={'data': bool})
 208     def less_than(lhs: float, rhs: float, output=None):
 209         output.data = lhs < rhs
 210
 211 .. seealso::
 212
 213     For more on Python type hinting with function annotations,
 214     check out :pep:`3107`.
 215
 216 Subgraphs
 217 =========
 218
 219 Basic *gmxapi* work consists of a flow of data from operation outputs to
 220 operation inputs, forming a directed acyclic graph (DAG).
 221 In many cases, it can be useful to repeat execution of a subgraph with updated
 222 inputs.
 223 You may want a data reference that is not tied to the immutable result
 224 of a single node in the work graph, but which instead refers to the most recent
 225 result of a repeated operation.
 226
 227 One or more operations can be staged in a :py:class:`gmxapi.operation.Subgraph`,
 228 a sort of meta-operation factory that can store input binding behavior so that
 229 instances can be created without providing input arguments.
 230
 231 The subgraph *variables* serve as input, output, and mutable internal data
 232 references which can be updated by operations in the subgraph.
 233 Variables also allow state to be propagated between iterations when a subgraph
 234 is used in a *while* loop.
 235
 236 Use :py:func:`gmxapi.subgraph` to create a new empty subgraph.
 237 The ``variables`` argument declares data handles that define the state of the
 238 subgraph when it is run.
 239 To initialize input to the subgraph, give each variable a name and a value.
 240
 241 To populate a subgraph, enter a SubgraphContext by using a :py:func:`with` statement.
 242 Operations created in the *with* block will be captued by the SubgraphContext.
 243 Define the subgraph outputs by assigning operation outputs to subgraph variables
 244 within the *with* block.
 245
 246 After exiting the *with* block, the subgraph may be used to create operation
 247 instances or may be executed repeatedly in a *while* loop.
 248
 249 .. note::
 250
 251     The object returned by :py:func:`gmxapi.subgraph` is atypical of *gmxapi*
 252     operations, and has some special behaviors. When used as a Python
 253     `context manager <https://docs.python.org/3/reference/datamodel.html#context-managers>`__,
 254     it enters a "builder" state that changes the behavior of its attribute
 255     variables and of operaton instantiation. After exiting the :py:func:`with`
 256     block, the subgraph variables are no longer assignable, and operation
 257     references obtained within the block are no longer valid.
 258
 259 Looping
 260 =======
 261
 262 An operation can be executed an arbitrary number of times with a
 263 :py:func:`gmxapi.while_loop` by providing a factory function as the
 264 *operation* argument.
 265 When the loop operation is run, the *operation* is instantiated and run repeatedly
 266 until *condition* evaluates ``True``.
 267
 268 :py:func:`gmxapi.while_loop` does not provide a direct way to provide *operation*
 269 arguments. Use a *subgraph* to define the data flow for iterative operations.
 270
 271 When a *condition* is a subgraph variable, the variable is evaluated in the
 272 running subgraph instance at the beginning of an iteration.
 273
 274 Example::
 275
 276     subgraph = gmx.subgraph(variables={'float_with_default': 1.0, 'bool_data': True})
 277     with subgraph:
 278         # Define the update for float_with_default to come from an add_float operation.
 279         subgraph.float_with_default = add_float(subgraph.float_with_default, 1.).output.data
 280         subgraph.bool_data = less_than(lhs=subgraph.float_with_default, rhs=6.).output.data
 281     operation_instance = subgraph()
 282     operation_instance.run()
 283     assert operation_instance.values['float_with_default'] == 2.
 284
 285     loop = gmx.while_loop(operation=subgraph, condition=subgraph.bool_data)
 286     handle = loop()
 287     assert handle.output.float_with_default.result() == 6
 288
 289 .. _gmxapi logging:
 290
 291 Logging
 292 =======
 293
 294 *gmxapi* uses the Python :py:mod:`logging` module to provide hierarchical
 295 logging, organized by submodule.
 296 You can access the logger at ``gmxapi.logger`` or, after importing *gmxapi*,
 297 through the Python logging framework::
 298
 299     import gmxapi as gmx
 300     import logging
 301
 302     # Get the root gmxapi logger.
 303     gmx_logger = logging.getLogger('gmxapi')
 304     # Set a low default logging level
 305     gmx_logger.setLevel(logging.WARNING)
 306     # Make some tools very verbose
 307     #  by descending the hierarchy
 308     gmx_logger.getChild('commandline').setLevel(logging.DEBUG)
 309     #  or by direct reference
 310     logging.getLogger('gmxapi.mdrun').setLevel(logging.DEBUG)
 311
 312 You may prefer to adjust the log format or manipulate the log handlers.
 313 For example, tag the log output with MPI rank::
 314
 315     try:
 316         from mpi4py import MPI
 317         rank_number = MPI.COMM_WORLD.Get_rank()
 318     except ImportError:
 319         rank_number = 0
 320         rank_tag = ''
 321         MPI = None
 322     else:
 323         rank_tag = 'rank{}:'.format(rank_number)
 324
 325     formatter = logging.Formatter(rank_tag + '%(name)s:%(levelname)s: %(message)s')
 326
 327     # For additional console logging, create and attach a stream handler.
 328     ch = logging.StreamHandler()
 329     ch.setFormatter(formatter)
 330     logging.getLogger().addHandler(ch)
 331
 332 For more information, refer to the Python `logging documentation <https://docs.python.org/3/library/logging.html>`__.
 333
 334 More
 335 ====
 336
 337 Refer to the :doc:`pythonreference` for complete and granular documentation.
 338
 339 For more information on writing or using pluggable simulation extension code,
 340 refer to https://gitlab.com/gromacs/gromacs/-/issues/3133.
 341 (For gmxapi 0.0.7 and GROMACS 2019, see https://github.com/kassonlab/sample_restraint)
 342
 343 .. todo:: :issue:`3133`: Replace these links as resources for pluggable extension code become available.