Courtesy of Joe Glick, Chief Data Scientist, RYLTI
Natural language processing (NLP) is a branch of AI that is being rapidly adopted, especially in biomedicine. The hope is to include the mushrooming volume of publications in data-driven R&D. This technology is subject to risk, especially driven by ambiguity, but there are mitigation strategies and tools.
NLP AMBIGUITY SOURCES: the nature of human language and algorithmic bias.
The nature of language – Chomsky’s generative linguistics work in the 1950s was seminal to modern generative AI, but in 2011 he stood before a packed auditorium at a CogSci Society conference and asked: “What is human language?” He stated that no theory of the mind to date explains what human language is or how it came about and affirmed that he will not look at another paper on the subject unless it accounts for sign languages used by the deaf around the world, which have visual and spatial grammar, very different from spoken language. No one challenged his view that the gap between human language and lower primates is so huge and complex that it proves that we all descended from one individual. His reasoning: “That it happened in one individual is statistically inexplicable, that it happened in two is inconceivable.” In 2022 clues began to appear about the genetic differences between humans and other species in what used to be viewed as junk DNA (search HAQER).
Since we cannot explain fundamentally what human language is, we cannot write software that understands it. This drives the risk of interpretation errors. In an interview with MIT Technology Review (7/5/23) Eric Schmidt, former Google CEO said, “we should be cognizant of the limitations—and even hallucinations—of current LLMs before we offload much of our paperwork, research, and analysis to them.
Algorithmic bias – LLMs are biased in three ways:
- The designers’ understanding of the problem space limits what the algorithm classifies as relevant to the search
- The volume of poor-quality content on the internet that has trained LLMs, and accounts for many of the “hallucinations”
- Differentiated meaning of terms across scientific and technical domains that impede precise classification
NLP AMBIGUITY STRATEGY: Human-led governance of AI risks and bias
The FDA has begun addressing this issue with emerging requirements and the introductory FDA article with links to the discussion papers is here:
NLP AMBIGUITY TOOLS: Biomimetic Digital Twins
To address these issues and to provide guidance to the biomedical community, the National Academies of Sciences, Engineering and Medicine (NAS) sponsored by the National Institutes of Health (NIH), the National Science Foundation (NSF) and the Department of Energy (DOE), began advocating research into the use of biomedical digital twins technology to more effectively model multidimensional and multi-scale biological complexity (https://www.nationalacademies.org/event/01-30-2023/opportunities-and-challenges-for-digital-twins-in-biomedical-sciences-a-workshop).
Conclusion
Implementing the methodology described here has two critical requirements:
- Early adopter mindset
- Network of innovative partners to evolve the ecosystem
We are committed to it. For more information contact us at hello@causaility.com or 408-908-8900.