--- /dev/null
+Analysis output data handling {#page_analysisdata}
+=============================
+
+The \ref module_analysisdata module provides support for common data analysis
+tasks within the \ref page_analysisframework. The basic approach used in the
+module is visualized below:
+
+\dot
+ digraph analysisdata_overview {
+ rankdir = BT
+ dataobject [label="data object\n(subclass of gmx::AbstractAnalysisData)"]
+ datamodule1 [label="data module\n(implements gmx::AnalysisDataModuleInterface)"]
+ datamodule2 [label="data module\nthat also provides data"]
+ datamodule3 [label="data module"]
+ datamodule1 -> dataobject
+ datamodule2 -> dataobject
+ datamodule3 -> datamodule2
+ }
+\enddot
+
+Typically, an analysis tool provides its raw data output through one or more
+gmx::AnalysisData objects (the root _data object_ in the diagram above).
+This object provides only storage for the data.
+
+To perform operations on the data, one or more _data modules_ can be attached
+to the data object. Examples of such operations are averaging, histogramming,
+and plotting the data into a file. Some data modules are provided by the \ref
+module_analysisdata module. To implement new ones, it is necessary to create a
+class that implements gmx::AnalysisDataModuleInterface.
+
+In many cases, such data modules also provide data that can be processed
+further, acting as data objects themselves. This makes it possible to attach
+further data modules to form a processing chain. In simple cases, such a chain
+ends in a module that writes the data into a file, but it is also possible to
+access the data in a data object (whether a plain data object or a data module)
+programmatically to do further computation or post-processing outside the
+framework. To do this, the data object typically needs to be told in advance
+such that it knows to store the data permanently even if attached modules do
+not require it.
+
+The modules can do their processing online, i.e., as the data is produced.
+If all the attached modules support this, it is not necessary to store all the
+raw data in memory. The module design also supports processing frames in
+parallel: in such cases, the data may become available out of order. In
+particular for writing the per-frame data into a file, but also for other types
+of post-processing, it is necessary to reorder the data sequentially. This is
+implemented once in the framework, and analysis tools do not need to worry,
+other than using the provided API.
+
+
+Structure of data
+=================
+
+At the highest level, data can be structured into separate
+gmx::AbstractAnalysisData objects that operate independently. Each such object
+has an independent set of post-processing modules.
+
+Within a gmx::AbstractAnalysisData object, data is structured along three
+"dimensions":
+
+ - _frames_: There is one or more frames in each data object. For raw data
+ produced by an analysis tool, these typically correspond to input trajectory
+ frames. For other data set, it can be viewed as an X axis of a graph.
+ - _data sets_: There is one or more data sets in each data object. For most
+ purposes, data sets work independently (i.e., the post-processing modules
+ operate on each data set separately), but some modules reduce the data sets
+ into single columns in the output. The main purpose for using multiple data
+ sets is to share the same post-processing chain for multiple sets of data
+ (e.g., multiple RDFs computed by the same tool in one pass), in particular
+ for cases where the number of data sets is not known at compile time.
+ Note that each data set contains the same number of frames.
+ - _columns_: There is one or more columns in each data set. Different data
+ sets can contain a different number of columns. Each column in a frame can
+ contain a single value (see below for supported values).
+
+Programmatically the data within each frame is organized into _point sets_.
+Each point set consists of a continuous range of columns from a single data
+set. There are two types of data:
+
+ - _simple_: For each frame, there is exactly one point set for each data set,
+ and that point set spans all columns in that data set.
+ - _multipoint_: For each frame, there can be any number of point sets, and
+ they may span arbitrary columns. It is allowed that point sets overlap,
+ i.e., that multiple point sets specify a value for the same column.
+
+The main purpose of multipoint data is to support cases where it is not known
+in advance how many values there will be for each frame, or where that number
+is impractically large. The need to do this is mainly a matter of
+performance/implementation complexity tradeoff: with a more complex internal
+implementation, it would be possible to support larger data sets without a
+performance/memory impact they currently impose. The current implementation
+places the burden of deciding on the appropriate usage pattern on the user
+code, allowing for much simpler internal implementation.
+
+An individual value (identified by frame, data set, and column) consists of a
+single value of type `real`, an optional error value, and some flags.
+The flags identify what parts of the value are really available. The following
+states are possible:
+ - _present_: The value is set.
+ - _missing_: The value is marked as missing by the data source. In this
+ state, the value can still be accessed, and the returned `real` value has
+ some meaning. Different data modules handle these cases differently.
+ - _unset_: The value is not set. It is not allowed to access the value for
+ other than querying the state. Data modules that ignore missing values
+ (by skipping all values not _present_) can also handle unset values.
+ Other data modules typically do not allow unset values.
+
+
+Data provider classes
+=====================
+
+The base class for all data objects (including data modules that provide data)
+is gmx::AbstractAnalysisData. This class provides facilities for attaching
+data modules to the data, and to query the data. It does not provide any
+methods to alter the data; all logic for managing the actual data is in derived
+classes.
+
+The main root (non-module) data object class for use in analysis tools is
+gmx::AnalysisData. This class provides methods to set properties of the data,
+and to add frames to it. The interface is frame-based: you construct one frame
+at a time, and after it is finished, you move to the next frame. The frames
+are not constructed directly using gmx::AnalysisData, but instead separate
+_data handles_ are used. This is explained in more detail below under
+\ref section_analysisdata_parallelization.
+
+For simple needs and small amounts of data, gmx::AnalysisArrayData is also
+provided. This class allows for all the data to be prepared in memory as a
+single big array, and allows random access to the data while setting the
+values. When all the values are set to their final values, it then notifies
+the attached data modules by looping over the array.
+
+
+Parallelization {#section_analysisdata_parallelization}
+===============
+
+One major driver for the design of the analysis data module has been to provide
+support for transparently processing multiple frames in parallel. In such
+cases, output data for multiple frames may be constructed simultaneously, and
+must be ordered correctly for some data modules, such as writing it into a
+file. This ordering is taken care of by the framework, allowing the analysis
+tool writer to concentrate on the actual analysis task.
+
+From a user's point of view, the main player in this respect is the
+gmx::AnalysisData object. If there are two threads doing the processing in
+parallel, it allows creating a separate gmx::AnalysisDataHandle for each
+object. Each of these handles can be used independently to construct frames
+into the output data, and the gmx::AnalysisData object internally takes care of
+notifying the modules correctly. If necessary, it stores finished frames into
+a temporary buffer until all preceding frames have also been finished.
+
+For increased efficiency, some data modules are also parallelization-aware:
+they have the ability to process the data in any order, allowing
+gmx::AnalysisData to notify them as soon as a frame becomes available.
+If there are only parallel data modules attached, no frame reordering or
+temporary buffers are needed. If a non-parallel data module is attached to a
+parallel data module, then that parallel data module takes the responsibility
+of ordering its output frames. Ideally, such data modules produce
+significantly less data than what they take in, making it cheaper to do the
+ordering only at this point.
+
+Currently, no parallel runner has been implemented, but it is likely that
+applicable tools written to use the framework require minimal or no changes to
+take advantage of frame-level parallelism once such a runner materializes.
+
+
+Provided data processing modules
+================================
+
+Data modules provided by the \ref module_analysisdata module are listed below
+with a short description. See the documentation of the individual classes for
+more details.
+Note that this list is manually maintained, so it may not always be up-to-date.
+A comprehensive list can be found by looking at the inheritance graph of
+gmx::AnalysisDataModuleInterface, but the list here is more user-friendly.
+
+<dl>
+<dt>gmx::AnalysisDataAverageModule</dt>
+<dd>
+Computes averages and standard deviations for columns in input data.
+One output value for each input column.
+</dd>
+<dt>gmx::AnalysisDataFrameAverageModule</dt>
+<dd>
+Computes averages for each frame in input data.
+One output value for each input data set for each frame.
+</dd>
+<dt>gmx::AnalysisDataBinAverageModule</dt>
+<dd>
+Computes averages within bins. Input is pairs of values, where the first
+value defines the bin, and the second value sets the value to accumulate into
+the average within the bin.
+One output histogram for each input data set.
+</dd>
+<dt>gmx::AnalysisDataSimpleHistogramModule</dt>
+<dd>
+Computes histograms. All values within a data set are added into a histogram.
+One output histogram for each input data set.
+Provides the histogram for each input frame separately, and also the full
+histogram over frames (through an internal submodule).
+</dd>
+<dt>gmx::AnalysisDataWeightedHistogramModule</dt>
+<dd>
+Computes histograms. Input is pairs of values, where the first value defines
+the bin, and the second value sets the value to add into that bin.
+Output like with gmx::AnalysisDataSimpleHistogramModule.
+</dd>
+<dt>gmx::AnalysisDataLifetimeModule</dt>
+<dd>
+Computes lifetime histograms. For each input column, determines the time
+intervals during which a value is continuously present/non-zero, and creates a
+histogram from the lengths of these intervals.
+One output histogram for each input data set.
+</dd>
+<dt>gmx::AnalysisDataPlotModule</dt>
+<dt>gmx::AnalysisDataVectorPlotModule</dt>
+<dd>
+Writes data into a file.
+</dd>
+</dl>
--- /dev/null
+Framework for trajectory analysis {#page_analysisframework}
+=================================
+
+\Gromacs provides a framework for implementing flexible trajectory analysis
+routines. It consists of a few components that can also be used individually,
+but in most cases it is desirable to use features from all of them to get most
+out of the framework. The main features are:
+
+ - Support for flexible selections that can be used to provide the set of
+ coordinates to analyze. They can be dynamic, i.e., select different atoms
+ for different trajectory frames, and also support evaluation of
+ center-of-mass/center-of-geometry coordinates for a group of atoms.
+ The latter means that a tool written to use the framework can automatically
+ analyze also center-of-mass positions (or a mixture of center-of-mass and
+ atomic positions) in addition to real atomic positions.
+ - Support for per-frame parallelization. The framework is designed to
+ support running the analysis in parallel for multiple frames for cases where
+ different frames can be analyzed (mostly) independently. At this time, the
+ actual parallelization is not implemented, but tools written to use the
+ framework should be able to take advantage of it as soon as it materializes
+ with no or minimal changes.
+ - Access to a library of basic analysis routines. Things such as computing
+ averages and histogramming are provided as reusable modules.
+ - Tool code can focus on the actual analysis. Tools are implemented by
+ subclassing an abstract class and providing an implementation for selected
+ pure virtual methods. The framework takes care of initialization tasks,
+ loading the trajectory and looping over it, evaluating selections, and also
+ provides basic features like making molecules whole before passing the frame
+ to the analysis code.
+ This approach also makes it possible to reuse the same tool code from a
+ scripting language such as Python simply by implementing general support for
+ such language bindings in the framework (no such integration is implemented
+ at this time, though).
+
+For a crash course on how to implement an analysis tool using the framework, see
+\subpage page_analysistemplate.
+
+
+High-level framework
+====================
+
+The \ref module_trajectoryanalysis module provides the high-level framework
+that integrates all the pieces together.
+It provides the abstract base class for analysis tool modules
+(gmx::TrajectoryAnalysisModule), and the code that runs such a module as a
+command-line tool (gmx::TrajectoryAnalysisCommandLineRunner).
+See the [analysis template](\ref page_analysistemplate) and the
+[trajectoryanalysis module documentation](\ref module_trajectoryanalysis) for
+more details.
+
+
+Selections
+==========
+
+The \ref module_selection module provides the support for selections.
+Most of the work of managing the selections is taken care by the command-line
+runner and the framework, and the analysis tool code only sees two main
+classes:
+
+ - gmx::SelectionOption and associated classes are used to declare the
+ number and type of selections the tool accepts (see below for
+ [details of the option support](#section_analysisframework_options)).
+ - The tool receives a set of gmx::Selection objects as a value of the
+ selection option. These classes provide the evaluated values for the
+ selections during the analysis. The framework evaluates them for each
+ frame such that when the tool is called, it can access the selections for
+ the current frame in the gmx::Selection objects it owns.
+
+A conceptual overview of the selection engine is available on a separate page:
+\subpage page_selections. In the full internal documentation, this page
+also provides an overview of the implementation of the selections.
+
+More technical details of the selection engine are also available in the
+[selection module documentation](\ref module_selection).
+This is useful in particular for understanding how the selections work in
+detail, or if you want to use the selection code outside the trajectory
+analysis framework.
+
+The selection module also provides functionality to do neighborhood searching
+in analysis tools. For the most common case of full 3D periodic boundary
+conditions, grid-based searching is implemented. See gmx::AnalysisNeighborhood
+for more details. This class can be used independently of other selection
+functionality.
+
+
+Output data handling
+====================
+
+The \ref module_analysisdata module provides two things:
+
+ - Support for uniformly providing output data from analysis tools.
+ Tools compute their output values and place them into a
+ _data object_ for further processing. This allows two things:
+ - Reusable data modules can be applied across different tools to do common
+ post-processing.
+ - The data object provides parallelization support.
+ - Set of reusable data modules for post-processing the data. These include
+ functionality like averaging data, computing histograms, and plotting the
+ data into a file. Many of these modules also provide their output as a data
+ object, allowing further data modules to be attached to them.
+
+The general concept is explained in more detail on a separate page:
+\subpage page_analysisdata.
+The [analysisdata module documentation](\ref module_analysisdata) provides more
+technical details.
+
+
+Input options {#section_analysisframework_options}
+=============
+
+To declare input data for the tool (typically, command-line options, including
+input files and selections), \ref module_options module is used.
+The analysis tool code receives a pre-initialized gmx::Options object in one of
+its initialization methods, and fills it with its input options.
+Basic options are declared in basicoptions.h, and also gmx::SelectionOption is
+used in the same manner. For each option, the tool declares a local variable
+that will receive the value for that option. After the options are parsed from
+the command line (by the framework), the tool code can read the values from
+these variables. The option declarations, and other information filled into
+the gmx::Options object, are also used to provide help to the user (also
+handled by the framework).
+See the documentation for gmx::TrajectoryAnalysisModule and the
+[options module documentation](\ref module_options) for more details.
--- /dev/null
+Example code for writing trajectory analysis tools {#page_analysistemplate}
+==================================================
+
+\Gromacs installation includes a template for writing trajectory analysis
+tools using \ref page_analysisframework.
+It can be found from `share/gromacs/template/` under the installation
+directory, and from `share/template/` in the source distribution.
+
+The full source code for the file is also included in this documentation:
+\ref template.cpp "template.cpp"
+The rest of this page walks through the code to explain the different parts.
+
+\dontinclude template.cpp
+
+Global definitions
+==================
+
+We start by including some generic C++ headers:
+\skip <string>
+\until <vector>
+and continue by including the header for the analysis library:
+\skipline <gromacs/trajectoryanalysis.h>
+This header includes other headers that together define all the basic data
+types needed for writing trajectory analysis tools.
+For convenience, we also import all names from the ::gmx namespace into the
+global scope to avoid repeating the name everywhere:
+\skipline using namespace
+
+
+Tool module class declaration
+=============================
+
+We then define a class that implements our analysis tool:
+\skip AnalysisTemplate
+\until };
+The analysis tool class inherits from gmx::TrajectoryAnalysisModule, which
+is an interface with a few convenience functions for easier interfacing
+with other code.
+Below, we walk through the different methods as implemented in the
+template (note that the template does not implement some of the virtual
+methods because they are less often needed), discussing some issues that can
+arise in more complex cases.
+See documentation of gmx::TrajectoryAnalysisModule for a full description of
+the available virtual methods and convenience functions.
+The first block of member variables are used to contain values provided to
+the different options. They will vary depending on the needs of the
+analysis tool.
+The AnalysisNeighborhood object provides neighborhood searching that is used
+in the analysis.
+The final block of variables are used to process output data.
+See initAnalysis() for details on how they are used.
+
+For the template, we do not need any custom frame-local data. If you think
+you need some for more complex analysis needs, see documentation of
+gmx::TrajectoryAnalysisModuleData for more details.
+If you do not care about parallelization, you do not need to consider this
+part. You can simply declare all variables in the module class itself,
+initialize them in gmx::TrajectoryAnalysisModule::initAnalysis(), and do any
+postprocessing in gmx::TrajectoryAnalysisModule::finishAnalysis()).
+
+
+Construction
+============
+
+The constructor (and possible destructor) of the analysis module should be
+simple: the constructor should just initialize default values, and the
+destructor should free any memory managed by the module. For the template,
+we have no attributes in our class that need to be explicitly freed, so we
+declare only a constructor:
+\skip AnalysisTemplate
+\until }
+In addition to initializing local variables that don't have default
+constructors, we also provide a title and one-line description of our module
+to the \p options_ object. These values will only affect the help output.
+
+
+Input options
+=============
+
+Initialization of the module is split into a few methods, two of which are
+used in the template. gmx::TrajectoryAnalysisModule::initOptions() is used
+to set up options understood by the module, as well as for setting up
+different options through gmx::TrajectoryAnalysisSettings (see the
+documentation of that class for more details):
+\skip void
+\until settings->
+\until }
+For the template, we first set a description text for the tool (used for
+help text). Then we declare an option to specify the output file name,
+followed by options that are used to set selections, and finally an option
+to set a cutoff value. For the cutoff, the default value will be the one
+that was set in the constructor, but it would also be possible to explicitly
+set it here. The values provided by the user for the options will be stored
+in member variables. Finally, we indicate that the tool always requires
+topology information. This is done for demonstration purposes only; the
+code in the template works even without a topology.
+
+For additional documentation on how to define different kinds of options, see
+gmx::Options, basicoptions.h, and gmx::SelectionOption. You only need to
+define options that are specific to the analysis; common options, e.g., for
+specifying input topology and trajectories are added by the framework.
+
+To adjust settings or selection options (e.g., the number of accepted
+selections) based on option values, you need to override
+gmx::TrajectoryAnalysisModule::optionsFinished(). For simplicity,
+this is not done in the template.
+
+
+Analysis initialization
+=======================
+
+The actual analysis is initialized in
+gmx::TrajectoryAnalysisModule::initAnalysis():
+\skip void
+\until }
+\until }
+Information about the topology is passed as a parameter. The settings
+object can also be used to access information about user input.
+
+One of the main tasks of this method is to set up appropriate
+gmx::AnalysisData objects and modules for them (see
+gmx::TrajectoryAnalysisModule for the general approach).
+These objects will be used to process output from the tool. Their main
+purpose is to support parallelization, but even if you don't care about
+parallelism, they still provide convenient building blocks, e.g., for
+histogramming and file output.
+
+For the template, we first set the cutoff for the neighborhood search.
+
+Then, we create and register one gmx::AnalysisData object
+that will contain, for each frame, one column for each input selection.
+This will contain the main output from the tool: minimum distance between
+the reference selection and that particular selection.
+We then create and setup a module that will compute the average distance
+for each selection (see writeOutput() for how it is used).
+Finally, if an output file has been provided, we create and setup a module
+that will plot the per-frame distances to a file.
+
+If the analysis module needs some temporary storage during processing of a
+frame (i.e., it uses a custom class derived from
+gmx::TrajectoryAnalysisModuleData), this should be allocated in
+gmx::TrajectoryAnalysisModule::startFrames() (see below) if parallelization
+is to be supported.
+
+If you need to do initialization based on data from the first frame (most
+commonly, based on the box size), you need to override
+gmx::TrajectoryAnalysisModule::initAfterFirstFrame(), but this is not used
+in the template.
+
+
+Analyzing the frames
+====================
+
+There is one more initialization method that needs to be overridden to
+support automatic parallelization: gmx::TrajectoryAnalysisModule::startFrames().
+If you do not need custom frame-local data (or parallelization at all), you
+can skip this method and ignore the last parameter to
+gmx::TrajectoryAnalysisModule::analyzeFrame() to make things simpler.
+In the template, this method is not necessary.
+
+The main part of the analysis is (in most analysis codes) done in the
+gmx::TrajectoryAnalysisModule::analyzeFrame() method, which is called once
+for each frame:
+\skip void
+\until {
+The \p frnr parameter gives a zero-based index of the current frame
+(mostly for use with gmx::AnalysisData), \p pbc contains the PBC
+information for the current frame for distance calculations with,
+e.g., pbc_dx(), and \p pdata points to a data structure created in
+startFrames().
+Although usually not necessary (except for the time field), raw frame
+data can be accessed through \p fr.
+In most cases, the analysis should be written such that it gets all
+position data through selections, and does not assume a constant size for
+them. This is all that is required to support the full flexibility of the
+selection engine.
+
+For the template, we first get data from our custom data structure for
+shorthand access (if you use a custom data object, you need a \c static_cast
+here):
+\skip AnalysisDataHandle
+\until parallelSelection
+
+We then do a simple calculation and use the AnalysisDataHandle class to set
+the per-frame output for the tool:
+\skip nb
+\until finishFrame()
+
+After all the frames have been processed,
+gmx::TrajectoryAnalysisModule::finishAnalysis() is called once. This is the
+place to do any custom postprocessing of the data. For the template, we do
+nothing, because all necessary processing is done in the data modules:
+\skip void
+\until }
+
+If the data structure created in gmx::TrajectoryAnalysisModule::startFrames()
+is used to aggregate data across frames, you need to override
+gmx::TrajectoryAnalysisModule::finishFrames() to combine the data from the
+data structures (see documentation of the method for details).
+This is not necessary for the template, because the ModuleData structure
+only contains data used during the analysis of a single frame.
+
+
+Output
+======
+
+Finally, most programs need to write out some values after the analysis is
+complete. In some cases, this can be achieved with proper chaining of data
+modules, but often it is necessary to do some custom processing.
+All such activities should be done in
+gmx::TrajectoryAnalysisModule::writeOutput(). This makes it easier to reuse
+analysis modules in, e.g., scripting languages, where output into files
+might not be desired. The template simply prints out the average distances
+for each analysis group:
+\skip void
+\until }
+\until }
+Here, we use the \c avem_ module, which we initialized in initAnalysis() to
+aggregate the averages of the computed distances.
+
+
+Definition of main()
+====================
+
+Now, the only thing remaining is to define the main() function.
+To implement a command-line tool, it should create a module and run it using
+gmx::TrajectoryAnalysisCommandLineRunner using the boilerplate code below:
+\skip int
+\until }
\dir share
\brief Directory that contains installed data files.
*/
+/*!
+\dir share/template
+\brief Template code for writing analysis programs.
+ */
+
+/*!
+\file share/template/template.cpp
+\brief Template code for writing analysis programs.
+
+See \ref page_analysistemplate for more information.
+ */
+
+/*!
+\example template.cpp
+\brief Template code for writing analysis programs.
+
+See \ref page_analysistemplate for more information.
+ */
- \subpage page_codelayout <br/>
This is a good place to start to understand how to
navigate the code and the documentation.
+ - \subpage page_analysisframework <br/>
+ Provides an overview of the framework that the \Gromacs library provides for
+ writing (trajectory) analysis tools.
\if libapi
- \subpage thread_mpi <br/>
This code is used internally for threading support, and also provides a
\defgroup group_analysismodules Analysis Modules
\brief
Modules used in analysis tools.
+
+A separate page describes the responsibilities of these modules:
+\ref page_analysisframework
*/
/*!
--- /dev/null
+Dynamic selections {#page_selections}
+==================
+
+The \ref module_selection module provides a mechanism that allows selections
+specified as text, and the engine evaluates them to atoms, or more generally to
+a set of positions, for one or more sets of coordinates. The selected atoms
+can depend on the trajectory frame. This allows writing general-purpose
+analysis tools that only operate on positions, and get a lot of flexibility for
+free from the selection engine. For example, such tools can readily operate on
+centers of mass of groups in addition to individual atoms as long as they do
+not require access to atomic properties.
+
+For people familiar with VMD, the selection syntax is quite familiar, but there
+are some differences. Not all the keywords supported by VMD are there, and
+there are some extensions related to the support to evaluate to center-of-mass
+positions in addition to individual atoms.
+For old-time \Gromacs users, tools that support selections do not generally
+need `make_ndx`.
+
+Structural overview
+===================
+
+Central concepts useful for understanding the selection engine are explained
+below. A graph represents the relations between the different parts, and a
+textual description of the user-visible components and other concepts follows.
+The graph also includes an overview of how the selection engine integrates into
+the \ref page_analysisframework. When using selections from the analysis
+framework, the parts in gray are managed by the framework.
+When using selections outside the framework, it is either possible to use only
+the core components (shown in the graph as a box), or to also use the selection
+option mechanisms. In both cases, the caller is responsible of managing all
+the objects owned by the framework in the graph.
+
+\dot
+ digraph selection_overview {
+ subgraph cluster_framework {
+ label = "analysis framework"
+ analysisframework [label="framework", fillcolor=grey75, style=filled]
+ options [label="options collection", fillcolor=grey75, style=filled]
+ analysistool [label="analysis tool"]
+ }
+ subgraph cluster_core {
+ label = "core engine"
+ labelloc = b
+ selectioncollection [label="selection collection", fillcolor=grey75, style=filled]
+ selectiondata [label="internal selection data", fillcolor=grey75, style=filled]
+ selection [label="selection object"]
+ }
+ selectionoption [label="selection option"]
+ selectionoptionmanager [label="selection option manager", fillcolor=grey75, style=filled]
+
+ selectioncollection -> selection [label="creates"]
+ selectioncollection -> selectiondata [label="owns and updates"]
+ selectionoption -> selection [label="returns"]
+ selection -> selectiondata [label="reads data", constraint=false]
+ selectionoptionmanager -> selectionoption [label="provides values to"]
+ selectionoptionmanager -> selectioncollection [label="gets selection objects"]
+ analysistool -> selectionoption [label="declares"]
+ analysistool -> selection [label="reads data from"]
+ analysistool -> options [label="declares options"]
+ analysisframework -> selectionoptionmanager [label="owns"]
+ analysisframework -> selectioncollection [label="owns"]
+ analysisframework -> options [label="owns"]
+ options -> selectionoption [label="contains"]
+ }
+\enddot
+
+ - _selection_: Evaluates to a single list of _selection positions_.
+ Note in particular that the output is positions, not a list of atoms.
+ A tool can accept one or more selections, and expect different semantics for
+ different selections.
+ - _dynamic selection_: The word _dynamic_ refers to selections for which the
+ set of positions (instead of only the positions themselves) depends on the
+ input coordinates.
+ - _selection position_: A single coordinate as returned by a selection.
+ This can correspond to an individual atom, but also to a collective
+ coordinate such as a center of mass of a group of atoms.
+ In addition to the output coordinates, the position provides information
+ about the atoms that constitute it, and metadata that allows one to
+ associate positions between different frames if different positions
+ are returned at the same time.
+ - _selection collection_: Group of selections that are processed as a unit
+ against the same topology and sets of coordinates.
+ In the analysis framework, there is always a single selection collection
+ managed by the framework.
+ - _selection variable_: When providing selections through text, it is possible
+ to create variables and use them as part of selections. This makes it
+ easier to write repetitive selections by making complex common
+ subexpressions into variables. This also provides optimization
+ opportunities for the selection engine: the variable value is not repeatedly
+ evaluated. Variables always exist in the context of a selection collection.
+ - _selection object_: When a selection is _parsed_ (see below), the selection
+ collection gives a handle to the selection as a _selection object_. This
+ handle is valid for the lifetime of the selection collection, and can be
+ used to access information about the selection. Operations on the selection
+ collection (_compilation_ and _evaluation_, see below) alter the values
+ returned by the selection objects.
+ - _selection option_: A special type of command-line option that directly
+ returns selection objects. This higher-level construct is used by the
+ analysis framework to provide a convenient interface for analysis tools:
+ they can simply declare one or more selection options, and get a list of
+ _selection objects_ as a return value for each of these. Other parts of the
+ selection engine are managed by the framework.
+
+Core selection engine
+=====================
+
+The core of the selection engine is the _selection collection_ object.
+The graph below shows how it handles selections. The operations that the
+collection object supports and their sequence is shown in the boxes in the
+middle. Inputs are shown at top, and outputs at the bottom.
+
+\dot
+ digraph selection_process {
+ subgraph cluster_collection {
+ label = "selection collection"
+ subgraph actions {
+ rank = same
+ create [shape=box]
+ parse [shape=box]
+ compile [shape=box]
+ evaluate [shape=box]
+ evaluatefinal [label="finish evaluation",shape=box]
+
+ create -> parse
+ parse:ne -> parse:nw
+ parse -> compile
+ compile -> evaluate
+ evaluate:ne -> evaluate:nw
+ evaluate -> evaluatefinal
+ }
+ selectiondata [label="internal selection data"]
+ }
+
+ selectiontext [label="selection text"]
+ topology [label="topology/\natom count"]
+ frame [label="frame coordinates"]
+
+ selection [label="selection object"]
+
+ selectiontext -> parse
+ parse:s -> selection:nw [label="returns"]
+ parse -> selectiondata [label="creates"]
+ topology -> compile
+ compile -> selectiondata [label="initializes\npositions"]
+ frame -> evaluate
+ evaluate -> selectiondata [label="sets\npositions"]
+ evaluatefinal -> selectiondata [label="resets to\npost-compilation\nstate"]
+ selectiondata -> selection [label="reads data", dir=back]
+ }
+\enddot
+
+ - _parsing_: after creating an empty selection collection,
+ selections need to be parsed from text. As a result, the selection
+ collection initializes an internal data object to hold some basic
+ information about the selections, and returns _selection objects_ as a
+ handle to this data. It is possible to parse more than one set of
+ selections into the same collection by calling the parsing methods more than
+ once. The input string to parsing can also contain variable declarations,
+ which get added into the collection and can be used in later calls to the
+ parser.
+
+ - _compilation_: when all selections are parsed, the whole selection
+ collection is compiled. This analyzes the provided selections, and
+ evaluates all parts that do not depend on atom coordinates (e.g.,
+ (sub)selections based on atom or residue names). After compilation,
+ the coordinates in the output positions are not initialized, but all other
+ information is initialized as if all atoms satisfied any dynamic conditions.
+ This means that any subsequent evaluation will return a subset of the
+ positions returned at this point. The caller can use this information to
+ check the selections for validity and allocate memory for its own
+ processing.
+ Compilation also allocates all the memory necessary to do the evaluation.
+
+ In the figure, topology is shown as input to the compilation, but generally
+ it can be set at any point before the compilation. If the selection text
+ does not require any information from the topology for evaluation, it is
+ sufficient to set only the atom count.
+
+ - _evaluation_: after the selections are compiled, they can be evaluated for
+ one or more sets of atom coordinates. This updates the set of positions
+ accessible through the selection objects. For dynamic selections, the group
+ of positions can change; for static selections, only the coordinates of the
+ positions are updated.
+
+ - _final evaluation_: This returns the selections to the state they were after
+ compilation, i.e., to the maximum possible set of positions. The
+ coordinates of the positions are again uninitialized, but other information
+ is available. The caller can use this information to do post-processing
+ and, e.g., produce labels in its output based on the selection positions.
+
+\if internal
+Internal implementation
+=======================
+
+Implementation details of different parts of the module are discussed on
+separate pages.
+
+ - \subpage page_module_selection_custom
+ - \subpage page_module_selection_parser
+ - \subpage page_module_selection_compiler
+ - \subpage page_module_selection_insolidangle
+\endif
+++ /dev/null
-/*
- * This file is part of the GROMACS molecular simulation package.
- *
- * Copyright (c) 2011,2012,2013, by the GROMACS development team, led by
- * David van der Spoel, Berk Hess, Erik Lindahl, and including many
- * others, as listed in the AUTHORS file in the top-level source
- * directory and at http://www.gromacs.org.
- *
- * GROMACS is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public License
- * as published by the Free Software Foundation; either version 2.1
- * of the License, or (at your option) any later version.
- *
- * GROMACS is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with GROMACS; if not, see
- * http://www.gnu.org/licenses, or write to the Free Software Foundation,
- * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
- *
- * If you want to redistribute modifications to GROMACS, please
- * consider that scientific software is very special. Version
- * control is crucial - bugs must be traceable. We will be happy to
- * consider code for inclusion in the official distribution, but
- * derived work must not be called official GROMACS. Details are found
- * in the README & COPYING files - if they are missing, get the
- * official version at http://www.gromacs.org.
- *
- * To help us fund GROMACS development, we humbly ask that you cite
- * the research papers on the package. Check out http://www.gromacs.org.
- */
-/*! \dir share/template
- * \brief Template code for writing analysis programs.
- */
-/*! \example template.cpp
- * \brief Template code for writing analysis programs.
- *
- * See \ref share/template/template.cpp "documentation" of the template for
- * more information.
- */
-/*! \internal \file
- * \brief Doxygen documentation source for template.cpp.
- */
-/*! \file template.cpp
- * \brief Template code for writing analysis programs.
- *
- * The full source code for the file: \ref template.cpp "template.cpp"
- *
- * \dontinclude template.cpp
- *
- * \section template_global Global definitions
- *
- * We start by including some generic C++ headers:
- * \skip <string>
- * \until <vector>
- * and continue by including the header for the analysis library:
- * \until <gromacs/trajectoryanalysis.h>
- * This header includes other headers that together define all the basic data
- * types needed for writing trajectory analysis tools.
- * For convenience, we also import all names from the ::gmx namespace into the
- * global scope to avoid repeating the name everywhere:
- * \skipline using namespace
- *
- *
- * \section template_class Tool module class declaration
- *
- * We then define a class that implements our analysis tool:
- * \until };
- * The analysis tool class inherits from gmx::TrajectoryAnalysisModule, which
- * is an interface with a few convenience functions for easier interfacing
- * with other code.
- * Below, we walk through the different methods as implemented in the
- * template (note that the template does not implement some of the virtual
- * methods because they are less often needed), discussing some issues that can
- * arise in more complex cases.
- * See documentation of gmx::TrajectoryAnalysisModule for a full description of
- * the available virtual methods and convenience functions.
- * The first block of member variables are used to contain values provided to
- * the different options. They will vary depending on the needs of the
- * analysis tool.
- * The AnalysisNeighborhood object provides neighborhood searching that is used
- * in the analysis.
- * The final block of variables are used to process output data.
- * See initAnalysis() for details on how they are used.
- *
- * For the template, we do not need any custom frame-local data. If you think
- * you need some for more complex analysis needs, see documentation of
- * gmx::TrajectoryAnalysisModuleData for more details.
- * If you don't care about parallelization, you don't need to conside this
- * part. You can simply declare all variables in the module class itself,
- * initialize them in gmx::TrajectoryAnalysisModule::initAnalysis(), and do any
- * postprocessing in gmx::TrajectoryAnalysisModule::finishAnalysis()).
- *
- *
- * \section template_ctor Construction
- *
- * The constructor (and possible destructor) of the analysis module should be
- * simple: the constructor should just initialize default values, and the
- * destructor should free any memory managed by the module. For the template,
- * we have no attributes in our class that need to be explicitly freed, so we
- * declare only a constructor:
- * \skip AnalysisTemplate
- * \until }
- * In addition to initializing local variables that don't have default
- * constructors, we also provide a title and one-line description of our module
- * to the \p options_ object. These values will only affect the help output.
- *
- *
- * \section template_options Input options
- *
- * Initialization of the module is split into a few methods, two of which are
- * used in the template. gmx::TrajectoryAnalysisModule::initOptions() is used
- * to set up options understood by the module, as well as for setting up
- * different options through gmx::TrajectoryAnalysisSettings (see the
- * documentation of that class for more details):
- * \skip void
- * \until settings->
- * \until }
- * For the template, we first set a description text for the tool (used for
- * help text). Then we declare an option to specify the output file name,
- * followed by options that are used to set selections, and finally an option
- * to set a cutoff value. For the cutoff, the default value will be the one
- * that was set in the constructor, but it would also be possible to explicitly
- * set it here. The values provided by the user for the options will be stored
- * in member variables. Finally, we indicate that the tool always requires
- * topology information. This is done for demonstration purposes only; the
- * code in the template works even without a topology.
- *
- * For additional documentation on how to define different kinds of options, see
- * gmx::Options, basicoptions.h, and gmx::SelectionOption. You only need to
- * define options that are specific to the analysis; common options, e.g., for
- * specifying input topology and trajectories are added by the framework.
- *
- * To adjust settings or selection options (e.g., the number of accepted
- * selections) based on option values, you need to override
- * gmx::TrajectoryAnalysisModule::optionsFinished(). For simplicity,
- * this is not done in the template.
- *
- *
- * \section template_initialization Analysis initialization
- *
- * The actual analysis is initialized in
- * gmx::TrajectoryAnalysisModule::initAnalysis():
- * \skip void
- * \until }
- * \until }
- * Information about the topology is passed as a parameter. The settings
- * object can also be used to access information about user input.
- *
- * One of the main tasks of this method is to set up appropriate
- * gmx::AnalysisData objects and modules for them (see
- * gmx::TrajectoryAnalysisModule for the general approach).
- * These objects will be used to process output from the tool. Their main
- * purpose is to support parallelization, but even if you don't care about
- * parallelism, they still provide convenient building blocks, e.g., for
- * histogramming and file output.
- *
- * For the template, we first set the cutoff for the neighborhood search.
- *
- * Then, we create and register one gmx::AnalysisData object
- * that will contain, for each frame, one column for each input selection.
- * This will contain the main output from the tool: minimum distance between
- * the reference selection and that particular selection.
- * We then create and setup a module that will compute the average distance
- * for each selection (see writeOutput() for how it is used).
- * Finally, if an output file has been provided, we create and setup a module
- * that will plot the per-frame distances to a file.
- *
- * If the analysis module needs some temporary storage during processing of a
- * frame (i.e., it uses a custom class derived from
- * gmx::TrajectoryAnalysisModuleData), this should be allocated in
- * gmx::TrajectoryAnalysisModule::startFrames() (see below) if parallelization
- * is to be supported.
- *
- * If you need to do initialization based on data from the first frame (most
- * commonly, based on the box size), you need to override
- * gmx::TrajectoryAnalysisModule::initAfterFirstFrame(), but this is not used
- * in the template.
- *
- *
- * \section template_analysis Actual trajectory analysis
- *
- * There is one more initialization method that needs to be overridden to
- * support automatic parallelization: gmx::TrajectoryAnalysisModule::startFrames().
- * If you do not need custom frame-local data (or parallelization at all), you
- * can skip this method and ignore the last parameter to
- * gmx::TrajectoryAnalysisModule::analyzeFrame() to make things simpler.
- * In the template, this method is not necessary.
- *
- * The main part of the analysis is (in most analysis codes) done in the
- * gmx::TrajectoryAnalysisModule::analyzeFrame() method, which is called once
- * for each frame:
- * \skip void
- * \until {
- * The \p frnr parameter gives a zero-based index of the current frame
- * (mostly for use with gmx::AnalysisData), \p pbc contains the PBC
- * information for the current frame for distance calculations with,
- * e.g., pbc_dx(), and \p pdata points to a data structure created in
- * startFrames().
- * Although usually not necessary (except for the time field), raw frame
- * data can be accessed through \p fr.
- * In most cases, the analysis should be written such that it gets all
- * position data through selections, and does not assume a constant size for
- * them. This is all that is required to support the full flexibility of the
- * selection engine.
- *
- * For the template, we first get data from our custom data structure for
- * shorthand access (if you use a custom data object, you need a \c static_cast
- * here):
- * \skip AnalysisDataHandle
- * \until parallelSelection
- *
- * We then do a simple calculation and use the AnalysisDataHandle class to set
- * the per-frame output for the tool:
- * \skip nb
- * \until finishFrame()
- *
- * After all the frames have been processed,
- * gmx::TrajectoryAnalysisModule::finishAnalysis() is called once. This is the
- * place to do any custom postprocessing of the data. For the template, we do
- * nothing, because all necessary processing is done in the data modules:
- * \skip void
- * \until }
- *
- * If the data structure created in gmx::TrajectoryAnalysisModule::startFrames()
- * is used to aggregate data across frames, you need to override
- * gmx::TrajectoryAnalysisModule::finishFrames() to combine the data from the
- * data structures (see documentation of the method for details).
- * This is not necessary for the template, because the ModuleData structure
- * only contains data used during the analysis of a single frame.
- *
- *
- * \section template_output Output
- *
- * Finally, most programs need to write out some values after the analysis is
- * complete. In some cases, this can be achieved with proper chaining of data
- * modules, but often it is necessary to do some custom processing.
- * All such activities should be done in
- * gmx::TrajectoryAnalysisModule::writeOutput(). This makes it easier to reuse
- * analysis modules in, e.g., scripting languages, where output into files
- * might not be desired. The template simply prints out the average distances
- * for each analysis group:
- * \skip void
- * \until }
- * \until }
- * Here, we use the \c avem_ module, which we initialized in initAnalysis() to
- * aggregate the averages of the computed distances.
- *
- *
- * \section template_main Definition of main()
- *
- * Now, the only thing remaining is to define the main() function.
- * To implement a command-line tool, it should create a module and run it using
- * gmx::TrajectoryAnalysisCommandLineRunner using the boilerplate code below:
- * \skip int
- * \until }
- *
- *
- * \section template_references Where to go from here?
- *
- * For more information about the topics discussed here, see the following
- * pages:
- * - \ref module_trajectoryanalysis
- * - \ref module_selection
- * - \ref module_analysisdata
- */
#include "analysisdata/modules/average.h"
#include "analysisdata/modules/displacement.h"
#include "analysisdata/modules/histogram.h"
+#include "analysisdata/modules/lifetime.h"
#include "analysisdata/modules/plot.h"
#endif
average.h
displacement.h
histogram.h
+ lifetime.h
plot.h)
gmx_install_headers(analysisdata/modules ${ANALYSISDATA_MODULES_PUBLIC_HEADERS})
* It should be considered whether they should be moved somewhere else.
* \endif
*
- * \if internal
- * Implementation details of different parts of the module are discussed on
- * separate pages:
- * - \ref page_module_selection_custom
- * - \ref page_module_selection_parser
- * - \ref page_module_selection_compiler
- * - \ref page_module_selection_insolidangle
- * \endif
- *
* \author Teemu Murtola <teemu.murtola@gmail.com>
*/
/*! \file
/*
* This file is part of the GROMACS molecular simulation package.
*
- * Copyright (c) 2009,2010,2011,2012, by the GROMACS development team, led by
+ * Copyright (c) 2009,2010,2011,2012,2013, by the GROMACS development team, led by
* David van der Spoel, Berk Hess, Erik Lindahl, and including many
* others, as listed in the AUTHORS file in the top-level source
* directory and at http://www.gromacs.org.
* To help us fund GROMACS development, we humbly ask that you cite
* the research papers on the package. Check out http://www.gromacs.org.
*/
-/*! \page page_module_selection_custom Custom selection methods
+/*! \internal
+ * \page page_module_selection_custom Custom selection methods
*
* Custom selection methods are defined by creating a new instance of
* \c gmx_ana_selmethod_t and filling it with the necessary data for handling
* In both cases, gmx_ana_selmethod_register() does several checks on the
* structure and reports any errors or inconsistencies it finds.
*/
-/*! \file
+/*! \internal \file
* \brief API for handling selection methods.
*
* There should be no need to use the data structures or call the
const char **help;
} gmx_ana_selmethod_help_t;
-/*! \brief
+/*! \internal
+ * \brief
* Describes a selection method.
*
* Any of the function pointers except the update call can be NULL if the
/*
* This file is part of the GROMACS molecular simulation package.
*
- * Copyright (c) 2009,2010,2011, by the GROMACS development team, led by
+ * Copyright (c) 2009,2010,2011,2013, by the GROMACS development team, led by
* David van der Spoel, Berk Hess, Erik Lindahl, and including many
* others, as listed in the AUTHORS file in the top-level source
* directory and at http://www.gromacs.org.
* To help us fund GROMACS development, we humbly ask that you cite
* the research papers on the package. Check out http://www.gromacs.org.
*/
-/*! \file
+/*! \internal \file
* \brief API for handling parameters used in selections.
*
* There should be no need to use the data structures or call the
#define SPAR_ENUMVAL 128
/*@}*/
-/*! \brief
+/*! \internal \brief
* Describes a single parameter for a selection method.
*/
typedef struct gmx_ana_selparam_t
* To help us fund GROMACS development, we humbly ask that you cite
* the research papers on the package. Check out http://www.gromacs.org.
*/
-/*! \page page_module_selection_insolidangle Selection method: insolidangle
+/*! \internal
+ * \page page_module_selection_insolidangle Selection method: insolidangle
*
* This method selects a subset of particles that are located in a solid
* angle defined by a center and a set of points.
* point is in the solid angle if it lies within any of these cones.
* The width of the cones can be adjusted.
*
- * \internal
- *
* The method is implemented by partitioning the surface of the unit sphere
* into bins using the polar coordinates \f$(\theta, \phi)\f$.
* The partitioning is always uniform in the zenith angle \f$\theta\f$,