Machine Learning Creates Artificial Human Genomes

February 16, 2021

We’ve mapped the human genome, so the only thing left to do is to start replicating it…

Nothing sounds far-fetched anymore, but hey, we digress.

The study
In a recent >study published in the international journal PLOS Genetics, scientists used machine learning to generate chunks of human genetic code which do not belong to real humans.  In other words, they used existing data on genomes that were publicly available and used that to train a machine learning model to replicate the real thing. 

The genetic code created by the computer has similar characteristics to real genomes so is useful for scientific research.

Ethics was the driving force
Apparently, ethics is the driving force behind all of this work.  You can’t ethically use a patient’s genetic data for genomic research, so that’s a key missing raw material if you are a genomic scientist.

The researchers devised these machine-generated genomes, or artificial genomes, to help  overcome the issue of scarcity within a “safe ethical framework."  Hmm, that needs some unpacking.

But first, let’s talk about how well they did in actually replicating portions of genomes.

How good are the copies?
According to >multiple analyses performed by the research team behind the study to assess the quality of the generated genomes compared to real ones:

  • "Surprisingly, these genomes emerging from random noise mimic the complexities that we can observe within real human populations and, for most properties, they are not distinguishable from other genomes from the biobank we used to train our algorithm, except for one detail: they do not belong to any gene donor." 

Key point:  the artificial human genomes are not a complete copy of an individual’s genome
The team was unable to generate entire artificial genomes due to computational and algorithmic limitations, as described in an >article in Gizmodo.  Instead, the team “stitched together” multiple chunks of genome data to get to the complete genomic picture of one made-up individual.

  • “The training of the model is the bottleneck here. Oncethe model is trained, you can generate as many artificial genomes as you want in seconds,” Burak Yelmen, a geneticist involved in the study said. “Training of a 10,000-position genome chunk can vary dramatically depending on multiple factors.”

Yelman explained that the models can sometimes have a difficult time generating accurate results out of randomness. 

The Scroll takeaway.  So, computational limitations are the only reason they couldn’t generate an entire human genome.  In our humble opinion, that will create an even greater ethical dilemma than the one that inspired this research in the first place.  What happens if and when you can create the genetic blueprint for an entire human being’s genome from AI?  Pandora’s box anyone?

Easy peasy to share this story with your peeps

Level up your inbox with The Scroll

Get stories like this delivered to your inbox.

Business news focused on startups and tech. Get informed while being very mildly entertained.
No spam. No fluff. No nonsense. Ever.