Courtesy of Joe Glick, Chief Data Scientist, RYLTI
The Problem
Our business landscape is changing constantly, and our rapidly expanding scientific knowledge continues to expose new unknowns and uncertainties in energy transformation, biomedicine, and any other technology-intensive industry. For example, we used to refer to DNA as a blueprint, but now we understand it to be a molecular ecosystem.
Artificial intelligence (AI) and other technologies, particularly Large Language Models (LLMs) hold promise for driving advances, but, along with their benefits, these technologies have limitations and risks. Much has been written about hallucinations and regulatory challenges, but the primary limitation of AI/ML/LLMs is the inability to model the combinatorial complexity of the real world. This issue is scaled by the fact that this complexity is captured in a variety of aggregated information sources that are disconnected and do not qualify as training data.
To address these issues and to guide the biomedical community, the National Academies of Sciences, Engineering, and Medicine (NAS), sponsored by the National Institutes of Health (NIH), the National Science Foundation (NSF), and the Department of Energy (DOE),began advocating research into the use of biomimetic digital twins technology to more effectively model multidimensional and multi-scale biological complexity (download the complete report – http://nap.nationalacademies.org/26894).
While reporting on the multi-agency workshop on biomedical digital twins, Bissan Al-Lazikani MD, Anderson Cancer Center, explained that bottlenecks in drug discovery arise owing to the challenges of multidisciplinary and multiscale data integration and multiparameter optimization. To alleviate the issues associated with integrating data from disparate disciplines that span scales, instead of integrating the data points themselves, she suggested integrating how all of the data points interact with each other—essentially establishing edges that can be modeled graphically. This approach, which is especially useful when data are sparse, is advantageous in that different data are captured in the same logic. It is particularly promising for identifying drug-repurposing opportunities and novel therapeutics for cancers such as uveal melanoma.
Key Challenges from NAS Digital Twin Ecosystem Report
- A digital twin is more than just simulation and modeling. Digital twins have been the subject of widespread interest and enthusiasm; it is challenging to separate what is true from what is merely aspirational, due to a lack of agreement across domains and sectors as well as misinformation.
- Many of the potential uses of digital twins are currently intractable to realize with existing computational resources.
- Hybrid modeling approaches are a productive path forward for meeting the modeling needs of digital twins, but their effectiveness and practical use are limited by key gaps in theory and methods.
- Integration of component/subsystem digital twins is a pacing item for the digital twin representation of a complex system, especially if different fidelity models are used in the digital twin representation of its components/subsystems.
- Digital twins will typically entail high-dimensional parameter spaces. This poses a significant challenge to state-of-the-art surrogate modeling methods.
Our Theory and Methods
We believe that the fundamental theory required to begin addressing these challenges was identified in a Physics of Life Report published by the NAS which concluded, “An important lesson from the long and complex history of neural networks and artificial intelligence is that revolutionary technology can be based on ideas and principles drawn from an understanding of life, rather than on direct harnessing of life’s mechanisms or hardware.” This indicates the need for a biomimetic engineering approach to building and maintaining the digital twin ecosystem. (http://nap.nationalacademies.org/26894)
We have evolved the methods below in the real-world lab of delivering solutions to explore highly complex, multidimensional, and multiscale problem domains for global organizations. The NAS Digital Twins report is an authoritative validation of our approach and innovations.
Method categories:
- Real-world complexity modeling methods
- Real-world reasoning methods
- Real-world learning and adaptation methods
To support genomic research, we built a biomimetic digital twin ecosystem composed of four twin classes – patient profile, phenotype, gene variant, and protein variant. We incorporated the ecosystem into a leading geneticist’s advanced genomics experimental protocol, which we believe is the first report to include this methodology in research to understand the pathophysiology of disease. The use of this methodology has both leveraged and utilized dark data and has enabled unexpected discoveries.
This example is illustrative, and the methodology applies to all information domains that are complex, multidimensional, multiscale, or dynamic. Any executive who needs to weigh complex options and tradeoffs or manage dynamic, multifactor risks can benefit from the methodology.