By Laxmi Parida
The USDA, Mars Inc. and IBM recently announced progress in identifying traits in the relationship between a cacao tree’s pod color and its sustainability and taste. This marking of phenotype (traits) is still a biological challenge requiring complex algorithms to crack the Big Data of genomes.
While a number of plant and animal genomes have been assembled, the amount of information they represent is massive. An average organism has 30,000 genes on average, but it can vary wildly across species. In fact, genomically, plants are more complex than animals due to having more than two homologous sets of chromosomes (polyploidy) in some. And just as large is the number of potential experiments to validate any hypothesis on these! Linking traits on the genome is still the most difficult question in biology – the Holy Grail of genetics.
Algorithms to the rescue
Anecdotally, cacao breeders in the field have observed the match between pod color and taste. Now our algorithms can verify the connection between these traits, down to connecting RNA sequences.
We used algorithms to sift through the cacao’s genome to find associations between markers and pod color. And because the algorithms are not unique, per se, to the cacao genome, they could potentially be used on other plant or even animal genomes. In fact, one of our goals with the cacao work is to provide a framework to help other scientists find traits beyond pod colors, and even cacao – which is why we made the algorithms available.
Scientists across the world can add to the accumulating knowledge, improve the algorithms’ accuracy, and help refine other genetic characteristic searches. For example, researchers from different labs are applying our algorithms to other plant genomes, such as avocados, sugar beets, and even grapes used in making wine. While it’s too early to point to results, our progress stems from continuing to fine-tune the algorithms we and others develop.
Putting algorithms in the field
Don’t let the improvements made through algorithms fool you. While we can see a day in the future where a plant breeder could immediately sample the genetic information of a seedling, searching and attributing phenotypes won’t be embedded in a mobile app any time soon. It will take years to close the gap between understanding a genome and providing guidance to a breeder in the field.
In the meantime, we continue to enrich the database. The more thoroughly and reliably it is augmented, the faster we and other scientists will understand the cacao and, through comparative genomics, other genomes as well.
Biology + Algorithms
IBM’s computational biology team is cross-disciplinary, with computer scientists, biologists and mathematicians. Over the last 10-15 years, these skills have become necessary to understand the data of biology. Today, universities are offering courses in bioinformatics because insight now happens at the intersection of biology and algorithmics.