Mon, 05/08/2019 - 12:24
Big data is revolutionising science. But as well as changing physics, chemistry and biology, it鈥檚 changing the nature of science itself. Institute researchers Wolf Reik and Stefan Schoenfelder and bioinformatics expert Simon Andrews reflect on how big data is re-shaping not only the way they work, but how they think. And we discover how bioinformatics 鈥 once considered a geeky corner of biology by some 鈥 has become central to scientific progress.
When Professor Wolf Reik, head of the Epigenetics programme, thinks about how big data has revolutionised his field he remembers the Swiss anatomist Karl Theiler. Theiler spent a lifetime creating 鈥The House Mouse: Atlas of Embryonic Development鈥, painstakingly taking embryos at different stages of development and using staining and microscopy to identify each tissue type. 鈥淣ow, we take the same embryos and put them in a big machine which sequences up to 100,000 cells at a time. Through gene expression, it gives us an equally detailed atlas of development, but because we can now use multi-omics methods that link together different layers in a single cell 鈥 the epigenome and the transcriptome 鈥 we can ask much deeper questions about how these patterns arise mechanistically. That鈥檚 what we really want to know,鈥 says Reik. Despite being a relic of an earlier scientific age, Theiler鈥檚 atlas remains on Reik鈥檚 bookshelves: an illustration of how big data has transformed the scientific questions he can ask and an embodiment of how it鈥檚 reshaped the way his younger colleagues think. Before big data, researchers thought and worked on single genes 鈥 how they were regulated and their role in development, health and ageing. Now, thanks to the recent developments in next-generation sequencing, the focus is firmly on the genome as a whole. 鈥淲e can now look at 20,000 genes or 20,000 promoters and get huge amounts of information. The younger members of my group get excited about the whole genome and what it鈥檚 doing, whereas I was brought up in an era of asking what single genes do; it鈥檚 a fundamental difference in thinking,鈥 says Reik. Big data brings huge opportunities, but using techniques that generate massive amounts of increasingly complex data also presents huge computational challenges. So how do Reik and other researchers extract meaning from this deluge of data? The answer lies in bioinformatics, the science that has emerged at the intersection of biology, computer science and statistics. Dr Simon Andrews, head of Bioinformatics, belongs to this new breed of experts. Since joining the Institute in 2001, he鈥檚 seen the group expand from two to 10 staff, many of whom have their roots in biology. 鈥淟ots of people in my group were once biologists who happened to play with computers for fun. My mother was a primary school teacher. Sometimes she鈥檇 turn up with a computer that had been donated to the school, point me towards it and say 鈥榤ake this do something that I can take back into the classroom!鈥欌 Andrews recalls. 鈥淎t university I built my own computers because we couldn鈥檛 afford to buy them, and when I started research we were beginning to get electronically-generated data.鈥 His PhD generated a respectable 1,000 bases of DNA sequence. Today, a single sample at the Institute yields 40 billion. 鈥淭he fundamental change is that many experiments generate amounts of data that are impossible to understand without a computer. Before, computers were a nice add-on; now they are fundamental,鈥 he says.
One of the Institute鈥檚 core facilities, the Bioinformatics group provides computational power and data analysis plus expert advice and bespoke development work. 鈥淲hat fires me up are computational problems that spring from biology,鈥 says Andrews, and what researchers often need most are ways of making their data more accessible. Over several years, Andrews鈥 group has developed packages capable of visualising sequencing data sets with billions of data points. 鈥淭hese are unfathomable on their own, but we can turn them into billions of positions in a genome, and visualise what they look like,鈥 he explains. Like Reik and Andrews, Dr Stefan Schoenfelder has lived through the revolution wrought by next-generation sequencing and big data. 鈥淚t changes the way you think and changes the way we work,鈥 he says. 鈥淲hen I did my PhD 15 years ago I spent all my time doing experiments in the lab. Now it鈥檚 the analysis that takes the time.鈥 Schoenfelder is interested in how gene function and gene expression are controlled by non-coding bits of DNA known as regulatory sequences. In linear terms, genes and their regulatory elements may be some distance apart, so how the genome is organised in three dimensions is one of his key questions. 鈥淲hereas we used to look at individual examples, now it鈥檚 possible to address those questions genome wide. We can get a complete picture of all the interacting sequences in a cell,鈥 he says. 鈥淲hen I came here after my PhD, it was something I thought might happen at the end of my career. That it鈥檚 happened so quickly is incredible.鈥 It also means that researchers need to learn how to interpret data, so the Institute鈥檚 Bioinformatics group makes a major difference. 鈥淭he skills I was equipped with in my PhD are not enough anymore. It鈥檚 normal to keep learning in science, but this is a quantum leap,鈥 says Schoenfelder. 鈥淚n a competitive field you need to work rapidly. I often work with dedicated bioinformaticians because it鈥檚 almost impossible to be an expert in both.鈥 The next scientific revolution is anyone鈥檚 guess, but Schoenfelder is sure it will only underscore how much more we need to understand. 鈥淪equencing and its impact on personalised medicine will continue to grow. High-resolution microscopy, observing live cells and even individual molecules, will be another game changer,鈥 he concludes. 鈥淲e make contributions all the time, but we know so little. That鈥檚 humbling 鈥 but it鈥檚 also very exciting to be a part of.鈥 This feature was written by Becky Allen for the
05 August 2019
By Becky Allen