2 # This file is part of the GROMACS molecular simulation package.
4 # Copyright (c) 2019, by the GROMACS development team, led by
5 # Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
6 # and including many others, as listed in the AUTHORS file in the
7 # top-level source directory and at http://www.gromacs.org.
9 # GROMACS is free software; you can redistribute it and/or
10 # modify it under the terms of the GNU Lesser General Public License
11 # as published by the Free Software Foundation; either version 2.1
12 # of the License, or (at your option) any later version.
14 # GROMACS is distributed in the hope that it will be useful,
15 # but WITHOUT ANY WARRANTY; without even the implied warranty of
16 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 # Lesser General Public License for more details.
19 # You should have received a copy of the GNU Lesser General Public
20 # License along with GROMACS; if not, see
21 # http://www.gnu.org/licenses, or write to the Free Software Foundation,
22 # Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
24 # If you want to redistribute modifications to GROMACS, please
25 # consider that scientific software is very special. Version
26 # control is crucial - bugs must be traceable. We will be happy to
27 # consider code for inclusion in the official distribution, but
28 # derived work must not be called official GROMACS. Details are found
29 # in the README & COPYING files - if they are missing, get the
30 # official version at http://www.gromacs.org.
32 # To help us fund GROMACS development, we humbly ask that you cite
33 # the research papers on the package. Check out http://www.gromacs.org.
35 """Abstract base classes for gmxapi Python interfaces.
37 This module consolidates definitions of some basic interfaces in the gmxapi
38 Python package. These definitions are evolving and mostly for internal use, but
39 can be used to check compatibility with the gmxapi implementation details that
40 are not otherwise fully specified by the API.
42 Refer to [PEP 484](https://www.python.org/dev/peps/pep-0484) and to
43 [PEP 526](https://www.python.org/dev/peps/pep-0526) for Python background.
46 Type checking fails when accessing attributes of generic classes (see PEP 484).
47 This affects some scoping choices.
50 It is worth noting some details regarding type erasure with Python generics.
51 Generic type parameters (class subscripts) allow for annotation of dynamic
52 types that are bound when the generic is instantiated. The subscript is not
53 part of the run time class. This means that we can use subscripted generic
54 classes for static type checking, but use for any other purpose is discouraged.
55 Thus, type variable parameters to generics have less meaning that do template
56 parameters in C++, and are orthogonal to subclassing. Note, though, that a
57 subclass is not generic if the generic parameters are bound (explicitly
58 with type subscripts or implicitly by omission (implying `typing.Any`).
61 (or types of parameters) to functions can be used to dispatch a generic
62 function to a specific function.
64 In other words: keep in mind the
65 orthogonality of generic classes and base classes, and recognize that
66 composed objects mimicking C++ template specializations are not distinguishable
69 Note: This module overly specifies the API. As we figure out the relationships.
70 As we clarify interactions, we should trim these specifications or migrate them
71 to gmxapi.typing and focus on developing to functions and function interfaces
72 rather than classes or types. In the mean time, these types help to illustrate
73 the entities that can exist in the implementation. Just keep in mind that
74 these ABCs describe the existence of "protocols" more than the details of
75 the protocols, which may take longer to define.
77 .. todo:: Clarify protocol checks, ABCs, and mix-ins.
80 # Note that the Python typing module defines generic classes in terms of abstract
81 # base classes defined in other modules (namely `collections`), but without
82 # actual inheritance. The ABCs are not intended to be instantiated, and the
83 # generics are very mangled objects that cannot be instantiated. However,
84 # user-defined subclasses of either may be instantiated.
86 # This restriction is probably due somewhat to implementation constraints in the
87 # typing module, but it represents a Separation of Concerns that we should
88 # consider borrowing in our model. Practically, this can mean using an abstract
89 # for run time checking and a generic for static type checking.
91 # There may not be a compelling reason to
92 # rigorously separate ABC and generic type, or to disallow instantiating a
93 # generic unless it is also an abstract. Note the distinction, though, between
94 # abstract generics and fully implemented generics. NDArray
95 # and Future are likely to be examples of fully implemented generics, while
96 # Context and various interface types are likely to have abstract generics.
98 # Use abstract base classes to define interfaces and relationships between
99 # interfaces. Use `typing` module machinery for static type checking. Use the
100 # presence or absence of an expected interface, or exception handling, for run
101 # time type checking. Use `isinstance` and `issubclass` checking against
102 # non-generic abstract base classes when the valid interface is complicated or
103 # unknown in the caller's context.
105 # Also, note that abstract base classes cannot assert that a simple data member
106 # is provided by a subclass, but class data members of Generic classes are
107 # interpreted by the type checker as instance members (unless annotated ClassVar).
110 from abc import ABC, abstractmethod
112 from typing import Type, Callable
115 class EnsembleDataSource(ABC):
116 """A single source of data with ensemble data flow annotations.
118 Note that data sources may be Futures.
121 dtype (type): The underlying data type provided by this source.
122 source: object or Future of type *dtype*.
123 width: ensemble width of this data source handle.
126 This class should be subsumed into the core gmxapi data model. It is
127 currently necessary for some type checking, but will probably disappear
131 def __init__(self, source=None, width=1, dtype=None):
137 def member(self, member: int):
138 """Extract a single ensemble member from the ensemble data source."""
139 return self.source[member]
143 """Reset the completion status of this data source.
145 Deprecated. This is a workaround until the data subscription model is
146 improved. We need to be able to fingerprint data sources robustly, and
147 to acquire operation factories from operation handles. In other words,
148 a Future will need both to convey its unique recreatable identity as
149 well as to be able to rebind to its subscriber(s).
151 Used internally to allow graph edges to be reused without rebinding
154 protocols = ('reset', '_reset')
155 for protocol in protocols:
156 if hasattr(self.source, protocol):
157 getattr(self.source, protocol)()
160 class NDArray(collections.abc.Sequence, ABC):
161 """N-dimensional data interface.
165 # TODO: Fix the data model and ABC vs. type-checking conflicts so that we can
166 # recognize NDArrays and NDArray futures as compatible data in data flow operations.
167 # We cannot do the following because of the forward reference to Future:
169 # def __subclasshook__(cls, subclass):
170 # """Determine whether gmxapi should consider the provided type to be consistent with an NDArray."""
171 # if subclass is cls:
174 # # For the purposes of gmxapi data flow, a Future[NDArray] is equivalent to an NDArray.
175 # if Future in subclass.__mro__:
176 # return issubclass(subclass.dtype, cls)
178 # def is_compatible(candidate):
179 # if issubclass(candidate, collections.abc.Sequence) \
180 # and not issubclass(candidate, (str, bytes)):
183 # any(is_compatible(base) for base in subclass.__mro__)
185 # return NotImplemented
188 # TODO: Define an enumeration.
189 SourceProtocol = typing.NewType('SourceProtocol', str)
193 """gmxapi resource object interface.
195 Resources are generally provided by Operation instances, but may be provided
196 by the framework, or abstractly as the outputs of other scripts.
198 A resource is owned in a specific Context. A resource handle may be created
199 in a different Context than that in which the resource is owned. The Context
200 instances negotiate the resource availability in the new Context.
202 # No public interface is yet defined for the Resource API.
204 # Resource instances should have an attribute serving as a sequence of available
205 # source protocols in decreasing order of preference.
206 # TODO: Clarify. Define an enumeration with the allowed values.
207 # TODO: Enforce. The existence of this attribute cannot be confirmed by the abc.ABC machinery.
208 # TODO: Consider leaving this off of the ABC specification and make it an implementation
209 # detail of a single_dispatch function.
210 # TODO: Similarly, find a way for the subscriber to have a chain of handlers.
211 _gmxapi_source_protocol = typing.Sequence[SourceProtocol]
214 class Future(Resource):
215 """Data source that may represent Operation output that does not yet exist.
217 Futures represent "immutable resources," or fixed points in the data flow.
221 def dtype(self) -> type:
225 def result(self) -> typing.Any:
228 # TODO: abstractmethod(subscribe)
231 class MutableResourceSubscriber(ABC):
232 """Required interface for subscriber of a MutableResource.
234 A MutableResource is bound to a collaborating object by passing a valid
235 Subscriber to the resource's *subscribe()* method.
239 class MutableResource(Resource):
240 """An Operation interface that does not represent a fixed point in a data flow.
242 Providers and consumers of mutable resources are more tightly coupled than
243 data edge terminals and have additional machinery for binding at run time.
244 Examples include the simulation plugin interface, binary payloads outside of
245 the standard gmxapi types, and any operation interaction that the current
246 context must defer to lower-level details.
248 There is not yet a normative interface for MutableResources, but a consumer
249 of MutableResources has chance to bind directly to the provider of a
250 MutableResource without the mediation of a DataEdge. Accordingly, providers
251 and consumers of MutableResources must be able to be instantiated in the
254 def subscribe(self, subscriber: MutableResourceSubscriber):
255 """Create a dependency on this resource.
257 Allows a gmxapi compatible operation to bind to this resource.
258 The subscribing object will be provided with a lower-level registration
259 interface at run time, as computing elements are being initialized.
260 The nature of this registration interface is particular to the type of
261 resource and its participants. See, for instance, the MD plugin binding
266 class OutputDataProxy(ABC):
267 """A collection of Operation outputs.
269 This abstract base class describes the interface to the output of an
270 operation / work node.
272 # TODO: Specification.
273 # Currently, the common aspect of OutputDataProxy is that a class has public
274 # attributes that are exclusively OutputDescriptor instances, meaning that
275 # getattr(instance, attr) returns a Future object. However, there are several
276 # ways to implement getattr, and this sort of check in an ABC does not appear
279 # We might choose to assert that all public attributes must be compatible data
280 # descriptors, in conjunction with defining a more specific OutputDataProxy
281 # metaclass, but this makes for a dubiously growing chain of data descriptors
282 # we use for the output access.
284 # The data model might be cleaner if we move to something more
285 # like a Collection or Mapping with more conventional getters, but we would
286 # lose the ability to check type on individual elements. (However, the
287 # typing of return values is not normally the defining aspect of an ABC.)
289 # Another alternative is to collapse the contents of the `output` attribute
290 # into the Operation handle type, strongly define all handle types (so that
291 # the type checker can identify the presence of attributes), and rely only
292 # on type checking at the level of the data descriptors. (Dynamically defined
293 # OutputDataProxy classes are the execption, rather than the rule.)
295 # We will need to consider the details of type checkers and syntax inspection
296 # tools, like Jedi and MyPy, to make design choices that maximize API usability
297 # and discoverability.
300 class OperationReference(ABC):
301 """Client interface to an element of computational work already configured.
303 An "instance" of an operation is assumed to be a node in a computational
304 work graph, owned and managed by a Context. This class describes the
305 interface of the reference held by a client once the node exists.
307 The convergence of OperationReferences with Nodes as the results of the action
308 of a Director implies that a Python user should also be able to "subscribe"
309 to an operation handle (or its member resources). This could be a handy feature
310 with which a user could register a call-back. Note that we will want to provide
311 an optional way for the call-back (as with any subscriber) to assert a chain
312 of prioritized Contexts to find the optimal path of subscription.
317 """Assert execution of an operation.
319 After calling run(), the operation results are guaranteed to be available
320 in the local context.
325 def output(self) -> OutputDataProxy:
326 """Get a proxy collection to the output of the operation.
328 Developer note: The 'output' property exists to isolate the namespace of
329 output data from other operation handle attributes and we should consider
330 whether it is actually necessary or helpful. To facilitate its possible
331 future removal, do not enrich its interface beyond that of a collection
332 of OutputDescriptor attributes. The OutputDataProxy also serves as a Mapping,
333 with keys matching the attributes. We may choose to keep only this aspect
334 of the interface instead of trying to keep track of the set of attributes.
339 class Fingerprint(ABC):
340 """Unique global identifier for an Operation node.
342 Represents the operation and operation inputs.
348 class OutputDescription(ABC):
351 There may not be a single OutputDescription base class, since the requirements
352 are related to the Context implementation.
357 """Reference to the state and description of a data flow edge.
359 A DataEdge connects a data source collection to a data sink. A sink is an
360 input or collection of inputs of an operation (or fused operation). An operation's
361 inputs may be fed from multiple data source collections, but an operation
362 cannot be fully instantiated until all of its inputs are bound, so the DataEdge
363 is instantiated at the same time the operation is instantiated because the
364 required topology of a graph edge may be determined by the required topology
365 of another graph edge.
367 A data edge has a well-defined topology only when it is terminated by both
368 a source and sink. Creation requires that a source collection is compared to
371 Calling code initiates edge creation by passing well-described data sources
372 to an operation factory. The data sources may be annotated with explicit scatter
375 The resource manager for the new operation determines the
376 required shape of the sink to handle all of the offered input.
379 and transformations of the data sources are then determined and the edge is
382 At that point, the fingerprint of the input data at each operation
383 becomes available to the resource manager for the operation. The fingerprint
384 has sufficient information for the resource manager of the operation to
385 request and receive data through the execution context.
387 Instantiating operations and data edges implicitly involves collaboration with
388 a Context instance. The state of a given Context or the availability of a
389 default Context through a module function may affect the ability to instantiate
390 an operation or edge. In other words, behavior may be different for connections
391 being made in the scripting environment versus the running Session, and implementation
392 details can determine whether or not new operations or data flow can occur in
393 different code environments.
395 A concrete Edge is a likely related to the consuming Context, and a single
396 abstract base class may not be possible or appropriate. Possible Context-agnostic
397 use cases for a global abstract Edge (along with Node) include topological aspects of data
398 flow graphs or user-friendly inspection.
403 """Object oriented interface to nodes configured in a Context.
405 In gmxapi.operation Contexts, this functionality is implemented by subclasses
408 Likely additional interfaces for Node include subscribers(), label(), and
409 (weak) reference helpers like context() and identifier().
411 .. todo:: Converge. A Node is to a concrete Context what an operation handle is to the None (Python UI) Context.
414 def handle(self, context: 'Context') -> OperationReference:
415 """Get a reference to the Operation in the indicated Context.
417 This is equivalent to the reference obtained from the helper function
418 or factory that added the node if and only if the Contexts are the same.
419 Otherwise, a new node is created in *context* that subscribes to the
422 # Note that a member function like this is the same as dispatching a
423 # director that translates from the Node's Context to *context*
426 def output_description(self, context: 'Context') -> OutputDescription:
427 """Get a description of the output available from this node.
429 Returns a subset of the information available through handle(), but
430 without creating a subscription relationship. Allows data sources and
431 consumers to determine compatibility and requirements for connecting
436 def input(self) -> Edge:
437 """Describe the bound data sources.
439 The returned object represents the data edge in the Context managing
440 the node, though the data sources may be from other Contexts.
444 def fingerprint(self) -> Fingerprint:
445 """Uniquely identify this Node.
447 Used internally to manage resources, check-point recovery, and messaging
448 between Contexts. The fingerprint is dependent on the operation and the
449 operation inputs, and is independent of the Context.
452 Opaque identifier describing the unique output of this node.
456 def operation(self) -> 'OperationImplementation':
457 """Get a reference to the registered operation that produces nodes like this.
459 # Note that the uniqueness of a node is such that Node.operation() and
460 # Node.input() used to get an OperationReference in the same Context
461 # should result in a handle to the same Node.
464 class NodeBuilder(ABC):
465 """Add an element of computational work to be managed by a gmxapi Context.
467 A Node generally represents an instance of a registered Operation, but the
468 only real requirement is that it contains sufficient information to run an
469 operation and to direct the instantiation of an equivalent node in a
470 different consumer Context.
472 In the near future, Node and NodeBuilder will participate in check-pointing
473 and in a serialization/deserialization scheme.
475 .. todo:: As the NodeBuilder interface is minimized, we can look for a normative
476 way to initialize a Generic NodeBuilder that supports the sorts of
477 type inference and hinting we would like.
481 def build(self) -> OperationReference:
482 """Finalize the creation of the operation instance and get a reference."""
486 def set_input_description(self, input_description):
487 """Add the details related to the operation input.
489 Example: In gmxapi.operation, includes signature() and make_uid()
491 .. todo:: This can probably be moved to an aspect of the resource factory.
496 def set_output_factory(self, output_factory):
497 """Set the factory that gives output description and resources for the Node.
499 Output is not fully describable until the input is known and the Node is
500 ready to be instantiated. This is the resource that can be used by the
501 Context to finish completely describing the Node. The interface of the
502 factory and of any object it produces is a lower level detail of the
503 Context and Operation implementations in that Context.
505 .. todo:: This can probably be moved to an aspect of the resource factory.
509 def add_input(self, name: str, source):
510 """Attach a client-provided data source to the named input.
512 .. todo:: Generalize to add_resource().
517 def set_handle(self, handle_builder):
518 """Set the factory that gives a builder for a handle to the operation.
520 .. todo:: Stabilize interface to handle_builder and move to an aspect of the
521 operation registrant.
525 def set_runner_director(self, runner_builder):
526 """Set the factory that gives a builder for the run-time callable.
528 .. todo:: This should be a specialized Director obtained from the registrant
529 by the Context when translating a Node for execution.
533 def set_resource_factory(self, factory: Callable):
534 """Register a resource factory for the operation run-time resources.
536 The factory will be called within the Context
538 .. todo:: Along with the merged set_runner/Director, the resource factory
539 is the other core aspect of an operation implementation registrant
540 that the Context should fetch rather than receiving through the
543 # The factory function takes input in the form the Context will provide it
544 # and produces a resource object that will be passed to the callable that
545 # implements the operation.
546 assert callable(factory)
553 A Context instance manages the details of the computing environment and
554 provides for the allocation of resources.
555 All gmxapi data and operations are owned by a Context instance.
556 The Context manages the details of how work is run and how data is managed.
558 Additionally, a concrete Context implementation determines some details of
559 the interfaces used to manage operation execution and data flow. Thus, API
560 calls may depend on multiple Contexts when, for instance, there is a source
561 Context, a caller Context, and/or a destination Context. For Python data
562 types and external interfaces to the gmxapi package (such as public function
563 signatures) is equal to *None*.
565 This abstract base class (ABC) defines the required interface of a Context
566 implementation. Client code should use this ABC for type hints. Concrete
567 implementations may, *but are not required*, to subclass from this ABC to
568 help enforce compatibility.
571 def node_builder(self, *, operation, label=None) -> NodeBuilder:
572 """Get a builder for a new work graph node.
574 Nodes are elements of computational work, with resources and execution
575 managed by the Context. The Context handles parallelism resources, data
576 placement, work scheduling, and data flow / execution dependencies.
578 This method is used by Operation director code and helper functions to
579 add work to the graph.
584 def node(self, node_id) -> Node:
585 """Get the indicated node from the Context.
587 node_id may be an opaque identifier or a label used when the node was
592 class ResourceFactory(ABC):
593 """Packager for run time resources for a particular Operation in a particular Context.
595 TODO: refine interface.
598 def input_description(self, context: Context):
599 """Get an input description in a form usable by the indicated Context."""
602 class OperationDirector(ABC):
603 """Interface for Operation entry points.
605 An operation director is instantiated for a specific operation and context
606 (by a dispatching factory) to update the work managed by the context
607 (add a computational element).
609 # TODO: How to handle subscriptions? Should the subscription itself be represented
610 # as a type of resource that is passed to __call__, or should the director be
611 # able to subscribe to resources as an alternative to passing with __call__?
612 # Probably the subscription is represented by a Future passed to __call__.
614 # TODO: Annotate `resources`, whose validity is determined by both context and operation.
616 def __call__(self, resources, label: typing.Optional[str]):
617 """Add an element of work (node) and return a handle to the client.
619 Implements the client behavior in terms of the NodeBuilder interface
620 for a NodeBuilder in the target Context. Return a handle to the resulting
621 operation instance (node) that may be specialized to provide additional
622 interface particular to the operation.
627 def handle_type(self, context: Context) -> Type[OperationReference]:
628 """Get the class used for operation references in this Context.
630 Convenience function. May not be needed.
635 def resource_factory(self,
636 source: typing.Union[Context, None],
637 target: typing.Optional[Context] = None) \
638 -> typing.Union[ResourceFactory, typing.Callable]:
639 """Get an appropriate resource factory.
641 The ResourceFactor converts resources (in the form produced by the *source* Context)
642 to the form consumed by the operation in the *target* Context.
644 A *source* of None indicates that the source is an arbitrary Python function
645 signature, or to try to detect appropriate dispatching. A *target* of
646 None indicates that the Context of the Director instance is the target.
648 As we merge the interface for a NodeBuilder and a RunnerBuilder, this
649 will not need to be specified in multiple places, but it is not yet
652 Generally, the client should not need to call the resource_factory directly.
653 The director should dispatch an appropriate factory. C++ versions need
654 the resource_factory dispatcher to be available for reasons of compilation
655 dependencies. Clients may want to use a specific resource_factory to explicitly
656 control when and where data flow is resolved. Also, the resource_factory
657 for the None Context can be used to set the function signature of the Python
658 package helper function. As such, it may be appropriate to make it a "static"
659 member function in Python.
664 class OperationImplementation(ABC):
665 """Essential interface of an Operation implementation.
667 Describe the essential features of an Operation that can be registered with
668 gmxapi to support building and executing work graphs in gmxapi compatible
671 An Operation is usable in gmxapi when an OperationImplementation is registered
672 with a valid identifier, consisting of a *namespace* and a *name*.
674 Generally, the *namespace* is the module implementing the Operation
675 and the *name* is a factory or helper importable from the same module that
676 triggers the OperationDirector to configure a new instance of the Operation.
678 The registered object must be able to describe its namespace and name, and
679 to dispatch an OperationDirector appropriate for a given Context.
681 # TODO: Either OperationImplementations should be composed or subclasses should
682 # each be singletons. We can still consider that OperationReferences are instances
683 # of OperationImplementations or its subclasses, or that OperationReference classes
684 # have a class data member pointing to a single OperationImplementation instance
685 # or operation_registry value.
687 # TODO: Consider a data descriptor and metaclass to validate the name and namespace.
690 def name(self) -> str:
691 """The name of the operation.
693 Generally, this corresponds to a callable attribute of a Python module
694 (named by namespace()) that acts as a factory for new operation instances.
695 It is also used by Context implementations to locate code supporting
700 # TODO: Consider a data descriptor and metaclass to validate the name and namespace.
703 def namespace(self) -> str:
704 """The namespace of the operation.
706 Generally, the namespace corresponds to a Python module importable in
707 the execution environment.
710 # TODO: Allow this to be an instance method, and register instances.
711 # Consider not storing an actual OperationImplementation in the registry.
712 # Note, though, that if we want to automatically register on import via
713 # base class (meta-class), the functionality must be in the class definition.
716 def director(cls, context: Context) -> OperationDirector:
717 """Factory to get an OperationDirector appropriate for the context."""