Emerging Science
LLMs are great at sequencing DNA because it is a big and narrow data problem – patterns of repeating base pairs. But then come additional layers of bioinformatics and analytics to try to understand the genotype-phenotype connections. This is a diverse and sparse data problem, which is the Achilles
heel of LLMs. Understanding the human genome alone, however, is not understanding human life.Recently we discovered that the human microbiome, the full array of microorganisms (the microbiota) that live on and in humans and, more specifically, the collection of microbial genomes that contribute to the broader genetic portrait, or METAGENOME, of a human. The genomes that constitute the human microbiome represent a remarkably diverse array of microorganisms that includes bacteria, archaea (primitive single-celled organisms), fungi, and even some protozoans and nonliving viruses. Bacteria are by far the most numerous members of the human microbiome: the bacterial population alone is estimated at between 75 trillion and 200 trillion individual organisms, while the entire human body consists of about 50 trillion to 100 trillion somatic (body) cells. The sheer microbial abundance suggests that the human body is in fact a “SUPRAORGANISM,” a collection of human and microbial cells and genes and thus a blend of human and microbial traits.
But it doesn’t stop there. Growing research into the PHAGENOME is exposing additional complexity.
Biomedical discovery, focusing on what some researchers labeled biological “dark matter,” as yet unidentified viruses and organisms of extraordinary variety, is potentially expanding the human metagenome. The phageome: A hidden kingdom within your gut | Knowable Magazine
Emerging science is continually expanding the scale and scope of diverse and sparse data that is critical to learning but is invisible to LLMs. But there is an even more significant roadblock.
About the author: Joe Glick, Co-Founder, Chief Innovation Officer, RYLTI