In modern materials research, accurate description of data generation and processing plays a crucial role – for example in measurements, simulations, or analyses of material properties. Many of these procedures are currently only familiar to experts and have often only been tested with special test datasets. Broader use in industry is therefore only possible when suitable tools are available that enable the application of these procedures in one's own projects. The MaterialDigital platform supports researchers and companies in using digital workflows and connecting various simulation, measurement, and analysis procedures with each other. This achieves interoperability. Existing and future tools can be used together more easily. At the same time, the platform ensures that data acquisition and processing can be reliably repeated and that the data obtained is available for further applications.
What are Workflows?
Workflow Implementations
Workflow Environments
Workflows & Ontologies
Workflow Store & Community
What are Workflows?
What role do workflows play in PMD?
A workflow is a chain of well-documented process steps to generate or process data for a specific problem and deliver a specific set of results. A workflow in materials research can include measurements on material samples, data preparation, simulation of material behavior, analysis of results, and creation of visualizations. Each step is digitally captured and documented, so that the entire sequence is reproducible and the results can be used for further projects.
Within PMD, we digitize these process steps (workflows) so that each step – including previously manual ones – is completely digitally mapped, machine-accessible, traceable, and storable.
Advantages of workflow environments
PMD currently supports two different workflow environments: Pyiron and SimStack. Their use offers the following advantages:
- Engineers, data scientists, etc. are provided with a user-friendly interface to a variety of tools.
- Non-experts are enabled to use standardized calculation workflows based on complex connections of individual software tools.
- Capture of complex individual calculation workflows for documentation and distribution (e.g., for a paper, an IP application, a collaboration, etc.)
- Automated storage of final results as well as all relevant intermediate steps (e.g., in database systems, repositories)
- Integration and easy access to HPC resources
- Connection to community-wide semantics and knowledge graphs (ontologies) through the description of input/output of individual tools within a workflow chain
- Exchange of data without granting access to raw data
- Execution of data acquisition across multiple rolling directions/multiple institutes
- Modification of simulation steps without having to run the entire simulation again
Workflow Implementations
Within PMD, a distinction is made between four levels (A, B, C, D) of workflow implementation. Within a project, these levels are used step by step with different implementation effort, workflow control, and user support. However, it is possible to combine different implementation levels for the different steps of a single workflow.
A: Script-based Job
The user provides a script job with well-defined input and output parameters for the individual steps for the individual tasks of a workflow. Input parameters can be passed to the script as a result of other calculations, or the outputs can be processed in a subsequent calculation step. The parameters and the script are saved and documented to ensure the reproducibility of the work step and to avoid recalculation of already calculated results. The file formats that the script uses for input and output do not have to be identical to the file format used by e.g., Pyiron for storage.
B: Predefined but extensible workflow classes / workflow components
A predefined class can be created for a simulation tool and integrated into the workflow system, e.g., either in Pyiron or SimStack. In Pyiron, this class defines and handles import/export as well as storage of input/output and serialization of job attributes for communication with HPC. In SimStack, this is accomplished through "Workflow active Nodes" (WaNo). In this way, well-defined problems with a subset of parameters (compared to the full functionality of the tools) can be executed as a step in a workflow.
The advantage of this approach is that users who are not familiar with a particular software tool do not have to learn attributes that are not essential for the present workflow, as they are provided through a simplified, readable, standard Pyiron interface or a structured XML format (with well-documented and easy-to-learn terminology). At the same time, they can easily be extended to new job types that enable additional or newly defined functionality.
C: Graphical user interface for predefined workflows
Once a workflow is set up, especially less experienced users do not want to deal with command lines and their execution. For this purpose, a graphical user interface can be provided that generates the output based on an often limited, predefined set of input parameters.
D: Interoperable workflows (ensuring community standards)
The use of generic input and output parameters for predefined classes enables the description of (part of) a workflow in a notation that is generic, i.e., independent of the specific software tool. Generic parameters are also the key to enabling interoperability between software tools. Such a standard exists in Pyiron for atomistic simulations (ASE), but must be implemented by domain experts for other communities. For example, the VMAP standard can be used for FEM simulations.
Workflow Environments
Pyiron - a Python-based integrated development environment for computational materials science
To coordinate method development in computational materials science and integrate existing methods into a common platform, a Python-based framework called Pyiron is being developed. It provides all the necessary tools to interactively execute complex simulation protocols that can combine different computer codes and perform millions of separate calculations on high-performance computer clusters.
At the same time, Pyiron enables the interactive development, implementation, and testing of these simulation protocols, similar to an integrated development environment (IDE). Through the integration of structured and unstructured data, metadata, and workflows, these are present within the same platform and are therefore automatically stored in an efficient hierarchical database. This preserves and makes accessible the complete materials science expertise of both developers and users in a standardized ontology.
The basic idea behind this framework is to provide a single tool with a uniform interface for a wide variety of simulation codes as well as analysis and visualization tools. The availability of this IDE allows the user to focus on science or product development instead of having to deal with technical details such as input/output formats of the codes and tools.
Further information about Pyiron:
SimStack Workflow Environment
A central difficulty in integrating material simulations into the product design cycle is the need to integrate customized simulation workflows, which usually consist of several modules, for each application. In addition, the execution of available patchwork solutions requires specialized know-how both in methodology and in the operation of supercomputers.
The SimStack workflow environment enables the efficient design and adaptation of complex workflows ("rapid prototyping") with software modules from different providers via drag-and-drop, whereby only settings relevant to the respective use case are exposed. Together with the automated execution of workflows on supercomputers, this minimizes the complexity for the end user and the required expertise. This enables the transfer of complex, scientific multiscale methods into industry.
Further information about SimStack:
Workflows & Ontologies
The interaction of workflows and ontologies within PMD has different facets:
- With a workflow, data can be read from an ontology-based data store as input, modified according to the process chain, and the output can be fed back into the ontology.
- The information of a workflow developed in the environment (i.e., the exploration of dependencies that were not previously known) can be automatically exported to a material knowledge graph.
- If the functionality of a tool (including input and output) is described in the form of an ontology, it can be integrated into a workflow environment without requiring a tool-specific parser.
- If the description of a workflow (including the input and output of a simulation module) is described generically (e.g., in the form of a standardized ontology), a tool-independent formulation is achieved. In this way, individual tools within a complex tool chain can be easily exchanged.
Workflow Store & Community
PMD Workflow Store
The PMD Workflow Store is a central place where you can share your science with others! Share your workflows and workflow modules to make your research work reproducibly available to the community.
The Workflow Store enables: - Exchange of workflows: Publish and share your scientific workflows with the community - Reproducibility: Ensure that your research results can be reproduced by others - Reusability: Use existing workflows as a starting point for your own projects - Documentation: Professional documentation of your workflows with all necessary metadata
How can I participate?
PMD offers regular workflow meetings where you can learn how to: - Develop and upload your own workflows - Use and adapt existing workflows - Implement best practices for workflow documentation - Exchange ideas with other workflow developers
Contact us to learn more about participating in the workflow community!