Hardware, Software Programs

GPU Advances Genomic Science ? A Possible Nobel Prize?

Professor Erez Lieberman Aidan on stage

The Problem

?The human genome is a sequence of 3 billion chemical letters inscribed in a molecule called DNA. Famously, short stretches (~ 10 letters or ?base pairs? of DNA fold into a double helix. But what about longer pieces? How does a 2 meter long macromolecule, the genome, fold up inside a 6 micrometer wide nucleus? And, once packed, how does the information contained in this ultra-dense structure remain accessible to the cell??

2m genome inside a 6 um nucleus…

A keynote address given last Wednesday at Nvidia?s GTC 2013 conference by Erez Lieberman Aidan disclosed results of his team?s investigation into the knotting of the human genome within the cell nucleus. The team has been collaborating on a long-term investigation into the dynamics of the human genome and its ability to express genetic transcription (Professor Erez Lieberman Aidan is Assistant Professor, Department of Genetics at Baylor College of Medicine; Department of Computer Science of Computational and Applied Mathematics at Rice University, Houston, Texas).

The team has developed an algorithm, called ?Hi-C?, which is inclusive of the entire process that supports their conclusions ? and for those that may desire to validate those conclusions. The Hi-C team and Professor Lieberman Aidan are now receiving worldwide recognition for their ?breakthrough science? contribution in the exploration of the human genome.

GPU Connection
Lieberman Aidan used pictures associated with his family on Facebook as an amalgam of how the Hi-C algorithm works. For instance, if you see a picture of Lieberman Aidan in a Facebook picture album, there is a high expectation of seeing pictures of his immediate family in the same area. This simile to the way in which chromosomes associate with each other proximally has been born out using the Hi-C algorithm.

Lieberman Aiden?s research utilizes a branch of mathematics first described by David Hilbert in 1891 ? Hilbert space-filling curve is a continuous fractal space-filling curve (now called Hilbert?s curve). Lieberman Aidan has added the third dimension and applied a function, which computes the most likely fold combination on the genome of a given chromosomal ordering.
The amount of computer power needed to accomplish this according to Lieberman Aidan:
?Types of data that are involved are massive and are unprecedented in size. When you talk about a 3 billion by 3 billion matrix, you are talking about forms of computation that are beyond the limits of many of the computational paradigms that I was actually familiar with – when our data started to really scale up.
And this is what has actually driven my group to adopt GPUs very, very aggressively. Because for manipulating these type matrices it?s essentially impossible unless you take advantage of the parallelization of the tasks you are taking on and take advantage of the awesome improvements that have been made in GPU technology – which we benefit from everyday.?

Fractal Globules unfold, don’t knot and follow a -1 power law

Another discovery occurred when the results of assembling the genomes by computation revealed that they naturally formed into compartmentalized departments known as ?fractal globules? versus ?equilibrium globules?, which was formerly the accepted theory. Lieberman Aidan?s theory advances this idea further with the discovery of the ?fractal globule? which unfolds without knotting, remains organized on unfolding and follows a -1 power rule – as he explained:

?What we are seeing (here), there is spatial compartmentalization between the parts of the human genome that are on and the parts that are off. The fancy biology term for this (again), is open and closed chromatins.
But the basic idea is very, very, simple. The genome is essentially (in any given cell is) picking out the bits that are interesting and putting them in one place and picking out the bits that it?s not terribly interested in (doesn?t want to turn those genes on for instance) and putting them in a different place, spatially segregating the two."

Fractal Globule to Equillibrium comparison

"And, it turns out, that as genes turn on and off things move from one department to the other and vice versa (so) what you see is a very striking fact, that had never been observed at this scale before, which is that an integral part of what is happening when a cell specializes is it?s turning on particular genes and turning off other genes which means it is dynamically folding its genome. It?s taking some genes and saying go over to the inactive department we?re shutting you off it?s taking other genes and pulling them into the active department and saying ?hey I need you to do whatever my function is going to be?. And so the process of cell specialization is also at some time a process of genomic origami.?

Professor Lieberman Aidan summarized the team?s findings:
?So, what we can see after having done all this parallel processing on the genome is the genome?s architecture seems optimized to itself to be a parallel processor.What we discovered is two really interesting facts:
One, is that as genomes specialize in function they?re actually folding. As cells specialize, the genomes inside them are actually folding into new configurations that enable that cell to be in one state versus another.
And, (two) we also got a sense of how folds like the fractal globule can enable the genome to keep this unbelievably long stretch of information fully accessible at all times whenever it is needed.

Using Hi-C and techniques like it we?re going to be able to learn a lot more about how genomes fold and unravel a lot more of this mystery of what is going on when a cell specializes and how does that go wrong in a disease like cancer.

Looking at some of the types of results we?ve gotten we?re pretty optimistic that we?re able to shed light on these problems and contextualize these problems in ways that would have been impossible without the power of both the experimental techniques as well as the computational resources such as Nvidia GPUs that we?re pointing at them.?

BSN* Take
GTC always makes way to highlight a medical advance enabled by the use of GPUs. Our take is that the results reported by Professor Lieberman Aidan are the beginning of a torrent of similarly ordered advances.

Simply put, it takes 6 to 7 years to develop and test a complex algorithm before deciding whether it is even marketable. Many of the new GPU based developments are so new that markets will by necessity need to adjust to their presence. The GPGPU, as an accelerator, is just now 7 years old, with the sufficient and necessary big iron just now coming into this new and rapidly expanding market space.

We see a step function effect coming to the GPU marketplace ? affecting both existing markets through vertical capture and envelopment while creating new opportunities where none existed before. The beginning of what now appears to be an exponential growth rate period for GPUs has only just begun.

A complete understanding of cancer and its underlying causes now does not seem to be quite so far away as it did just a week ago. In fact, we are siding on the optimistic side that it could come a lot sooner. We are also left wondering why so many are so soundly asleep ? with the exception being Nvidia of course?