Revolutionizing Disease Detection with Data
Foundational Technology
A critical innovation that has been developed in the last decade is a technology called metagenomic next-generation sequencing (mNGS). Traditional DNA sequencing attempts to identify a target organism or strand by searching for a specific DNA or RNA strand. While this has its advantages, it does make it difficult in situations where, for example, you don’t know what organism you’re looking for. With mNGS, you can take a sample of any biological origin, sequence it, and then it will pull out all of the different strands of DNA. You can then cross-reference this DNA with a central database of all organisms and determine which corresponds to each strand. In a medical context, you can use this to identify what a pathogen is without even having to consider the symptoms, or you can determine if the pathogen is even known to medicine in the first place.
Medical diagnosis works much like traditional DNA sequencing. When presented with a patient, the doctor hypothesizes what pathogen is causing said symptoms. They then order a test, or a series of tests, to confirm this hypothesis and decide on a course of treatment. While this method broadly works, several points of failure compromise its effectiveness.
One of these is rare and unusual pathogens. Doctors can only know so many diseases, collections of symptoms, and test protocols to determine and identify a suspect pathogen. There are a host of weird bacteria, fungi, and viruses that doctors may have only heard of 22 years ago in medical school, if ever. Patients who present with these pathogens are highly unlikely to receive proper treatment due to this gap in our medical system capacity.
An example to illustrate this comes from a 2015 article published in the Annals of Neurology.1 A woman presented twice to a local hospital with fevers, an altered mental status, and later vision issues. Over these two visits, she was tested for various viruses and parasites, but none came up positive. Eventually, when being tested for tuberculosis, her mental status declined so rapidly and severely that she was admitted to the Emergency Room. The subsequent MRI showed that her brain tissue had been decimated. While the patient could not be saved, mNGS was utilized to identify the causal agent. The amoeba Balamuthia mandrillaris was found to have somehow gotten into the woman’s body and consumed nearly all of her brain tissue.
Balamuthia infection is extremely rare. Less than ten people are diagnosed yearly.2 We have no idea where it comes from — water, the soil, the air, etc. Given these conditions, diagnosis is extremely difficult, intervention next to impossible.
This is where mNGS as a diagnostic tool is so powerful. It is hypothosis-free. Instead of relying on doctors (who, despite their incredible skillset, are still human), data can lead us to what is causing illness in a patient. Instead of identifying a lead, ordering a test, seeing if that test is correct, and then moving on to the next lead, medicine can instead identify the pathogen from the source. Not only does this have the potential to increase the speed and effectiveness of treatment for patients, it also can save a lot of money. The Balamuthia patient, between her initial treatments and the boatload of emergency care she received, cost over $1 million in treatment. A mNGS test, which from the start would have determined the pathogen responsible, costs around $2,200.3
While not cheap, mNGS can be an incredible tool for patients presenting outlier sets of symptoms as a first step in the diagnosis process. And considering just 20 years ago, the cost of sequencing the first human genome was $2.7 billion and took 13 years. A couple of grand and two weeks is a major leap forward and shows that costs will hopefully decline further.
Another major reason for implementing mNGS at the frontline of our medical system is its potential for novel pathogen identification. Since you are cross-referencing your sample with every known organism on Earth, if something new pops up, you will know about it instantly. This has serious implications for epidemic detection and response. There are a plethora of bugs we do not know about; in one survey of just 94 random patients in Uganda, three new (thankfully low-risk) viruses were discovered for the first time.4
If mNGS were a standard diagnostics tool, we would know not only when a new bug pops up, we could track the spread of it into the population, boosting our early epidemic response. If that doesn’t convince you, consider that the virus that causes COVID-19 was initially identified by mNGS.5 Had that method been standard procedure, we may have had several weeks’ head start on the virus
Creating a Network
Of course, simply making mNGS a common test at hospitals and doctors’ offices does not meet its full potential. For that, we need to link mNGS to a wider network.
An initiative towards such a network is called IDseq.6 They are constructing a central clearinghouse where data from mNGS worldwide can be fed into. This data will then be aggregated into a dashboard reporting where pathogens are active and can flag when a new or unusual virus, bacteria, fungi, or parasite pops up. Such a dashboard would be a game changer for our public health infrastructure.
IDseq supports hospitals in developing countries with training and infrastructure to make this happen while also constructing a centralized cloud for storing and processing this data. While sequencing is getting cheaper and cheaper, a side effect of the wealth of data metagenomics is giving us is that the computing power is getting far more expensive, as the growth in genetic data is higher than the growth in computing power.
The platonic goal is to have nearly every patient in the world who presents with an infectious disease be sequenced with mNGS. It would allow for fine-detailed, real-time observation of epidemiological trends while improving individual patients' diagnosis and treatment. Of course, such data needs to be kept anonymous in the interests of patient privacy. Establishing protocols and regulations that keep individuals from being identified outside of our public health professionals is a paramount component of this rollout.
Think of our weather forecasting system. The National Weather Service maintains thousands of data collection points nationwide — rain gauges, radar stations, thermometers, moisture meters, and weather balloons. This vast network produces abundant local data that builds the models that give us sophisticated and accurate weather forecasts. By laying out a large network of metagenomic collection points, we can develop a similar network for microbes the world over. We can know precisely when the flu is beginning to hit a city, when a new ebola outbreak occurs, or when, what, and where the next pandemic-level virus is unleashed upon the world.
What I’m Reading, Watching, and Listening too
You Just Never Know When the Pack Will Quit Squealing: A short essay of life in Iowa towns that rely on their packing plants.
Netflix Missed the Point of Avatar: What a shocker! A massive studio didn’t understand the story they were creating and absolutely fumbled the bag, wasting millions of dollars and tons of talent.
Wilson, M.R., N.M. Shanbhag, M.J. Reid, N.S. Singhal, J.M. Gelfand, et al. 2015. Diagnosing Balamuthia mandrillaris Encephalitis With Metagenomic Deep Sequencing. Ann Neurol 78(5): 722–730. doi: 10.1002/ana.24499.
Cope, J.R., J. Landa, H. Nethercut, S.A. Collier, C. Glaser, et al. 2019. The Epidemiology and Clinical Features of Balamuthia mandrillaris Disease in the United States, 1974 – 2016. Clin Infect Dis 68(11): 1815–1822. doi: 10.1093/cid/ciy813.
Levenson, D. 2020. Metagenomic Next-generation Sequencing. Clinical Laboratory News. https://www.myadlm.org/cln/articles/2020/janfeb/metagenomic-next-generation-sequencing (accessed 21 March 2024).
Ramesh, A., S. Nakielny, J. Hsu, M. Kyohere, O. Byaruhanga, et al. 2018. Etiology of fever in Ugandan children: identification of microbial pathogens using metagenomic next-generation sequencing and IDseq, a platform for unbiased metagenomic analysis. : 385005. doi: 10.1101/385005.
Chen, L., W. Liu, Q. Zhang, K. Xu, G. Ye, et al. 2020. RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak. Emerg Microbes Infect 9(1): 313–319. doi: 10.1080/22221751.2020.1725399.
Kalantar, K.L., T. Carvalho, C.F.A. de Bourcy, B. Dimitrov, G. Dingle, et al. 2020. IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring. Gigascience 9(10): giaa111. doi: 10.1093/gigascience/giaa111.