Andy Sharp, PhD, is an Associate Professor in the Department of Genetics and Genomic Sciences.
So, how long have you been at Sinai? Where were you before?
I started here just over a year ago—officially first of March last year. I’m from the UK—you can probably tell—did my Ph.D. in England; did post-doc work at Case Western Reserve University, Cleveland; then at University of Washington, Seattle; before I was here, I was in Geneva, in Switzerland, for about two and a half years.
Can you give us an overview of your research?
Yeah, so, I focus on human genetics, basically I’m interested in understanding the genome and figuring out what role genetic and epigenetic variation play in different human phenotypes. So, that’s pretty broad, but the main approach is—start with genome-scale analysis and then figure out what bits of the genome are interesting, and then what diseases those could be linked with. Some people call this “reverse genetics” or a “genome first” approach, as rather than starting with a disease and figuring out the gene, we do the opposite. So, we do lots of stuff: whole-genome DNA methylation profiling, gene expression datasets, copy number variation, all kinds of things—and we use both experimental and bioinformatic approaches.
I’ve done a whole lot of work before looking at recurrent deletions and duplications in the genome. Most of the stuff we do is hypothesis-driven—so that was based on the idea that certain parts of our genome, because of their architecture, the higher-order structure of the DNA, will be more prone to rearrangement. I developed custom assays to look at these specific regions and investigated them in different disease cohorts.
My lab right now is pretty small; we have two bioinformaticians and two people on the lab side, and there’s an interplay between those sides, so, people in the lab will generate big data sets with arrays or sequencing, and then the bioinformaticians will analyse it in different ways, pull out interesting stuff, get it back to the lab, and validate things. Or sometimes we have projects that start out purely as bioinformatic exercises, and then often they get to a point where we have a bunch of interesting regions, that we then want to go to the lab and do some follow-up studies on.
How did you become interested in this field, human genetics and genomics?
The honest answer is, complete chance. (Laughs.) When I was doing my undergrad, I hated the research project I was doing so much(it wasn’t genetics, it was somesome plant hormone), I swore at that time I would never get a job in a lab. Now, here I am. —I guess my dream’s come true: I’m not in the lab anymore! When I finished my degree, I did all kinds of stuff, traveling around the world, lived in Vietnam for a while… ended up by chance getting a job as a technician in a research lab… just, that was what was available, I needed a job – got offered the chance to do a Ph.D. by the professor who ran the lab.When I finished that she set me up with a job in the U. S., where I did my post-doc, and I went on from that.
What are the challenges of working with genome-scale data sets?
With all these kinds of genomic techniques there’s always biases inherent in them, and even small technical differences in how somebody does a high-throughput sequencing run, or how they run an array, dictates what questions you can and cannot ask of the data, and what makes biological sense. So, it’s always really important to know, OK, this is the underlying biology of what we are looking at, and this is how this experiment was done. Once you understand those two aspects, then you can sort out the real interesting biology from the experimental artifacts and make those big discoveries. One project that we’re doing is looking at repetitive gene families. In the past I’ve done a lot of work looking at copy number variations of genes, but a lot of the traditional ways of doing that are microarray-based, where you’re measuring intensity of spots on the array. They work OK for small changes in copy number, but we’re looking at genes that some people have more than 500 copies of a particular gene and other people may have, say, 10 copies of that gene, so there’s huge variation. So, to do that properly, you need some platform that has that dynamic range, and, it’s no good using a microarray where in that situation the spots on your array get so intense a camera can’t read the changes anymore. So we’re developing digital assays that are able to give you accurate genotypes across that wide variation. So, it’s knowing what the limits of your data are, and designing ways that’ll give you good information about the question you want to ask.
Beyond the technical issues most of the stuff I try and do has a good underlying hypothesis – or at least what I think is a good hypothesis! Especially these days with genomic technologies becoming more accessible, you can do so many 'fishing experiments' where you’re just, throwing crap at the wall and seeing what sticks, which sometimes works, but it’s not very smart. So, for example we are taking monozygous twins with different discordant phenotypes – one twin is normal, one twin is abnormal – and then screening their epigenomes to look for epigenetic differences that might underlie their phenotypes. Yes it’s a screening exercise – but it has a basic underlying hypothesis that we think identical twins will be enriched for epigenetic differences.
Where do you see the field going?
Genetics is going to become more and more widespread in medicine in general. Over the last few years the cost of sequencing DNA, sequencing someone’s genome, has come down. Now I wouldn’t say its routine, but it’s very feasible that I could sequence all the genes in your genome and have that result in a week or two at a cost of a couple of thousand bucks. So it’s starting to get into the realm where things like that are similar to getting an MRI done. Once we start to understand what all the variations in somebody’s genome actually do – which is the big challenge! – then it’s going to be something that’s used a lot. You could imagine in 10 or 20 years, maybe when someone’s born, they’ll have their DNA sequenced and figure out, these are the diseases this person’s going to be highly susceptible to so that treatments and screening become tailored to an individual– this is idea of personalized medicine. It’s not a reality yet, but, give it 10 years, and I think it’ll happen.
What are your own goals for your research?
Specific things? So, I would like to find epigenetic variants that underlie diseases; I think that’s a big under-explored area right now in human genetics. How do changes in the way our genes are regulated affect common or rare disease? I’m interested in looking at secondary structures of DNA, how DNA folds and adopts hairpins and other structures and what these do in regulating the genome. Again, I think that’s another big under-explored area of biology, because people traditionally think of DNA as As, Ts, Cs, and Gs in a row and that’s it, whereas in reality DNA is dynamic and folds in three dimensions.
So in terms of epigenetics, it sounds like you have been focusing mainly on DNA methylation, is that correct?
Yeah, mostly, that’s what I’ve been doing, just partly because it’s easy – or easier than looking at things like histone modifications, I should say. Epigenetics encompasses a thousand and one different things. But a lot of it requires live cells, where you need to do immunoprecipitations of proteins. For looking at DNA methylation you just need DNA. It’s kind of easy to look at in that way, in that most people have a tube of DNA from their patient, whereas if you say, hey, I want a freshly sampled piece of tissue snap-frozen in liquid nitrogen, that doesn’t usually happen so much. So with DNA methylation you have access to so many more samples. But there’s many different epigenetic marks; DNA methylation is generally thought of as a more longer-term mark, whereas things like chromatin modifications, or even nucleosome positioning, may be more dynamic. What we know about epigenetics is it can change with time, with environment, between different tissues, in different contexts. So it’s going to be a really hard problem to crack. And I think all of genetics is becoming more high-throughput, and that’s the kind of data that you need to understand these things.
If you could wave a magic wand, what sort of technology would you have that doesn’t currently exist?
I guess being able to sequence DNA and also know all of the epigenetic variants that are on that DNA as well – and that’s maybe starting to become available? Pacific Biosciences, claim you can measure some of those epigenetic marks – but having the ability to sequence DNA and know the variant regulatory changes on that DNA, I think would be incredibly helpful.
What advice do you have for people who are starting out in science now?
Do something you’d enjoy! In terms of careers or just projects, something you like, something you’re interested in – and I’d say, the most important thing in science is always to have good ideas and have a good understanding of what it is you’re working on.
How would you advise someone trained at the bench who needed (or wanted) to branch out into bioinformatics? Your own scientific background is in the wet lab, and you now mentor people with much more formal computer science training than you do—
Oh, for sure – I’ve never written a line of code in my life! Advice for people – I’d say, learn a little bit of coding, even if it is just some simple Perl – I’ve been told it’s not that hard to pick up. And have a bit of an understanding of statistics. I think that’s more and more important. A lot of what we do boils down to really simple statistics like t-tests, correlations and permutations, things like that, it’s not like we’re doing super-hard stats. So, , it’s not like you need to have a Ph.D. in stats or anything, it’s just knowing how to apply those things to get the information you want out of the data you have. And then being able to do that test a million times on a giant table of data, which is where some coding comes in.
What kinds of projects could students interested in your lab come and discuss with you?
I’m open to many different projects, but I have lots of different ideas that’s way more than four people can do right now! I have lots of cool ideas that are just kind of sitting on the back burner for someone to grab them and run with them
—We’re just starting to look at epigenetic differences in twins as a way to dissect epigenetic variation in disease, but I also want to apply epigenetic profiling to wider disease in general – the role of DNA methylation in splicing regulation, for example – and also to investigate things like in vitro fertilization.
—The role of variation in tandem repeats as regulators of gene expression: we have some data that when tandem repeats expand and contract, nearby genes go up and down in their expression.
—Also, investigating the role of multi-copy genes in different human phenotypes. We just got a whole bunch of data on about 200 multi-copy genes, and some of them are great candidates for human disease, highly polymorphic, and essentially been missed by most association studies, because they have no correlation with flanking SNP markers. Many of these genes change all the time on their own and are very dynamic.
—Looking at DNA secondary structures, and we have evidence suggesting these have roles in regulating things like imprinting (parent-of-origin effects).
—We have a fantastic data set I just got in right now, we’re basically screening the genome for novel sites of imprinting. We’ve also developed a novel bioinformatic way of looking for these based on association analysis in families using gene expression.
—One of my people is doing exome sequencing, and again this can be applied to many diseases.
So basically, I’m interested in kind of any genomic things that have relevance to human phenotypes in a very broad sense. We have data sets coming in every week – there’s always lots of cool stuff in there that’s just like, hey! here’s an idea! let’s check it out. And now that there’s all these arrays, high-throughput sequencing, available, there’s all kinds of questions you can ask that even a year or two ago wasn’t feasible.