GridMD is a C++ class library intended to help the developers to quickly build a simulation application and to run it in the distributed environment. The abbreviation GridMD stands for Grid-enabled Molecular Dynamics, however the part of the library responsible for workflow definitions may be used in all kinds of numerical simulation, not limited to the Molecular Dynamics (MD) method. A physicist using the library must not be aware of the distributed computing, she or he just uses the special GridMD function calls inside the application C++ code to perform parameter sweeps or other tasks which can be distributed at run-time.
The job manager component which submits jobs to a remote system may be used as a standalone tool (see files gridmd-2.0-job_manager-*.zip at http://sourceforge.net/projects/gridmd/files). The job manager documentation is provided in the separate file gridmd/doc/jobmngr_en.pdf.
GridMD has the usual notion of workflow, which may be represented by directed graph consisting of nodes and links (edges of workflow graph). The links represent dependence between nodes, the nodes represent some actions of the program. Unlike other workflow systems, GridMD does not require that all actions in the code should be wrapped in workflow elements. Instead, only the parts of the code most important for distributed execution may be specified as nodes.
There are several types of links in GridMD: logical or hard link from node A to node B means that node B may be executed only by the same process as node A AND only after node A; data link between A and B means that either A and B must be executed consecutively by the same process OR B may be executed without executing A provided that the result data of node A, associated with this data link, is accessible by the process executing B; process link between A and B is a special case of hard link assuming that some calculations which are initialized at node A are taking place in node B. Although being a limited subset of possible node dependencies, used by different workflow systems, this system of links provides a sufficient basis for iterative simulation experiments and can be easily extended in the future. Data links are the main sources of distributed capabilities of the code in general. The node result data associated with a link is represented in C++ code as an object of some data type.
The workflow nodes are defined and executed by the same GridMD application, so this application may proceed in two execution modes. Construction or manager mode is used for the assembly of the workflow graph and to its subsequent analysis. Worker mode is used to execute some subset (subgraph) of the nodes through the invocation of the same application with command line parameters uniquely specifying the subgraph for execution.
There is another basic idea of GridMD which makes it different from other workflow systems. GridMD uses the same code for designing and linking the graph elements, executing individual nodes and managing the job submission process. This makes GridMD applications very lightweight and compact, written in a single language (C++) and highly portable. Software requirements for GridMD are also very modest, it depends on two free portable C++ libraries: wxWidgets library and boost graph library.
For remote cross-platform execution the same application must be compiled for each of the available resources. For submission of remote jobs when the managing GridMD application runs on Windows system an SSH-client for Windows (typically Putty) is required to connect to the remote server. The remote compute resources may run a queueing system like PBS or be in a Globus-managed grid.
The GridMD application saves its current state at every important stage of its execution. The state is saved in XML file which is named <exeperiment_name>.xml. The application may be interrupted at any moment of its execution, for example while it is waiting for remote job completion. After that it may be restarted, retrieving all job information and workflow progress from the saved XML state file. To restart the GridMD application the following command line options are available:
| -r[config.xml] | continue execution from the state saved in 'config.xml'. If 'config.xml' is not specified, the default <exeperiment_name>.xml is used for restarting. Note that it is then overwritten by the new state information when the execution continues. |
| -R[config.xml] | restart execution from the beginning, using available calculated data link information and started jobs when possible. In this case all local nodes (i.e. the nodes hard-linked to 'finish' nodes) are re-executed. If 'config.xml' is not specified, the default <exeperiment_name>.xml is used for restarting. Note that it is then overwritten by the new state information when the execution continues. |
1.7.1