What I’ve learned: Toby Gibson
/ via embl/
From growing up enjoying the nature around him, Toby Gibson led his career towards computational biology where he collaborated with Des Higgins and Julie Thompson to develop the groundbreaking Clustal W bioinformatics tool that became the focus of one of the most cited scientific papers of all time
It’s spring in Heidelberg, and one can periodically still find Toby Gibson at EMBL Heidelberg these days, despite his retirement late last year.
The retired team leader isn’t there all the time, but he is still working on packing up a lab that was active for 27 years. And others – like myself, the archivist, alumni relations, and former collaborators – continue to hunger for just one more story or a bit of advice.
It’s how we caught up with Gibson to gather reflections on his career and what he has learned along the way. But let us start with a brief introduction.
Gibson earned his undergraduate biology degree from Edinburgh University, specialising in molecular biology. He went on to the Laboratory of Molecular Biology in Cambridge for his PhD, working with Bart Barrell as part of the DNA sequencing department overseen by Fred Sanger, the Nobel laureate who invented DNA sequencing. Barrell had worked with Sanger on developing methods that enabled increasingly larger sequencing projects. “In those days, this was fully experimental bench-based, and we were running sequencing gels. We were sequencing the Epstein-Barr virus, whose genome spanned ~120,000 bases. I did 20,672 bases,” Gibson explained.
“Toby inspired me to love linear motifs. In fact, I’d say that I’m now a lifelong motif campaigner! They’re everywhere in the cell! Toby not only crystallised the idea of ELM database, he also brought together the best minds and meticulously planned courses/meetings to develop and sustain the motif biology community in a selfless manner. Today, ELM is much more than a resource and is integral to many researchers who want to really understand cell processes. In retrospect, Toby has founded a community while also doing high-quality science and developing ELM.”
Gibson started in the Argos group at EMBL in 1986. “I came on a two-year postdoc, and now I’m actually retiring from that two-year postdoc at EMBL!” He is fond of saying this, even though he became a team leader with his own lab in 1991.
His seminal collaboration with Des Higgins, a bioinformatician, began at EMBL. The two met in 1990 when Higgins started at EMBL, working on the nucleotide sequence database. They would chat about proteins, their evolution, and their alignments (an effective way to compare related DNA or protein sequences).
Over coffee one day, Gibson showed Higgins an alignment of a family of proteins he had made by hand, using coloured pencils. Higgins responded by saying he had already developed a program (Clustal) that could do it automatically. Gibson was sceptical that it could be as good, but he tried it and was surprised it worked quite well. Julie Thompson joined Gibson as software developer and took on the coding. The three would tweak this programme, which eventually became Clustal W. Their paper, published in Nucleic Acids Research in 1994, became the most highly cited bioinformatics paper of all time, and the 10th most cited paper across all scientific fields, according to a 2014 analysis by Nature.
Clustal W surpassed expectations. It caught the wave of genome sequencing, so everyone from undergraduates to senior bioinformaticians was using this tool thousands of times each day around the world. Clustal W is credited for aiding research in myriad biological directions, including vaccine design and illuminating our understanding of diseases like Epstein-Barr, COVID, and cancer.
“We continued to collaborate on various Clustal programs for a further 17 years [after 1991]. This collaboration started because of the spirit of collaboration in EMBL and would not
have been possible without the enthusiastic agreement of our group leaders or the skill of Julie or the ideas from Toby in making it happen.”
Gibson had a way of seeing what tools were missing in molecular biology research and then finding ways to create them. His lab also developed and continues to host the Eukaryotic Linear Motif (ELM) resource, which has also developed into a hugely popular tool among life scientists.
What follows is the barest of glimpses into the career of Toby Gibson and his reflections on his work, the state of science, and his time at EMBL, in his own words.
The early years
I grew up partly in the country and enjoyed milking cows and things like this. I was always a bit of a naturalist – one of those guys who always had a bird book and knew the Latin names of the British birds.
When your family doesn’t have a background in academia, you tend to think that to be a scientist, you have to be like Einstein. In reality, if you study some of the right things, understand the concepts, and have a personality suited to being a scientist, it can work.
During my PhD, I was in a group that sequenced Epstein-Barr Virus, which causes mononucleosis or ‘kissing disease’. Nowadays it’s apparently very strongly associated with multiple sclerosis, and I think this finding is likely to hold up.
As a young scientist, one always has fantasies of doing the ‘beautiful experiment’. But I was a specialised DNA sequencer, so I couldn’t do that. My daily work was, truth be told, very, very, very monotonous routine stuff.
“During my postdoc in EMBL’s Gene Expression Programme (1992-1995), I met Toby and colleagues. Toby mentored me on how to hunt for new conserved protein domains and how to look at and interpret multiple sequence alignments. I became infected by Toby’s passion for protein sequence and structure analysis – subjected to his contagious and boundless generosity and collegiality. Hence, I emerged from my postdoc as a skilled, applied bioinformatician.”
I was just sequencing DNA. The intellectual aspect was wonderful, however, and I enjoyed the experimental work up to a point.
We were sequencing by hand, which meant we labelled DNA with radioactive phosphorus P32, a beta emitter. It decays very rapidly and spread into our fingers in much the same way that the beta electrons spread out on the X-ray film we used for sequencing. P32 posed some challenges for us: it didn’t last long, there were those small safety issues, and it produced smeary bands rather than clear ones with sharp edges on the X-rays. I conjectured we might be able to use an alpha emitter isotope instead – S35 instead of P32. It decays more slowly, so we could store it in our freezer longer. It also wouldn’t penetrate our fingers because it didn’t spread like P32. Best of all, it should produce sharper bands. So, we got some S35 to try, and everything that I thought was going to happen…well, happened. So we switched straight away.
The main problem with my career route was that I knew how to sequence DNA, but I didn’t really know much else. I was too junior to start a sequencing lab and a bit stuck.
I had one of those sleepless nights when I realised I needed to move from the bench to the computer because the data from all the sequences were going to go there. EMBL already had a data library – it was well established – and it also had a vigorous structural biology programme. As it turned out, EMBL was also setting up a bioinformatics unit.
Clustal and the SLiMs database
Most of what I’ve done has been quite incremental, but that’s the way most of science works.
Clustal was a game-changer in my career. Des developed the first version of Clustal when he was a PhD student in Dublin, working on HIV and the evolution of retroviruses and pathogens.
Scientists use alignments all the time to ascertain where function might be in protein sequences. If you were interested in HIV, for example, and wanted to line up some HIV proteins with other retroviruses to understand whether the conserved amino acids in the proteins were structurally or functionally important, then alignment enabled this. Making alignments also helps study evolution, such as understanding where coronaviruses come from.
“I met Toby while I was a postdoc beginning to work on linear motifs at an EMBO workshop he taught. The workshop and the interaction I had with Toby set the direction of my research. Toby set an example on conducting research and creating new knowledge at the highest standards, and he had an incredibly clear view of cell signalling! Debating and brainstorming with him was one of my favourite activities. Undoubtedly, Toby is one of the strongest influences I’ve had in my career.”
I’d become a staff scientist and could hire one person to work with me, which turned out to be Julie Thompson, a mathematician and C programmer. I thought we could improve upon Des’s earlier versions of Clustal. Des had several ideas he had not yet implemented, too.
Together we improved the quality of the alignment output, and it had all the hallmarks of great software tools: stability, efficiency, and ease of use. It was also freely available. I am habitually sceptical of these things and always check things by eye, but Clustal W marked the point where I could pretty much flip the amount of time I spent checking by hand to doing the alignments first by computer.
After the 1994 paper – and I have to add how nice it was to have a female first author on our paper – the software became massively widespread. It’s very satisfying when everybody’s using your software.
In the 1990s it was clear many proteins in the cell weren’t single-folded units that just bump into each other like billiard balls. They could be much bigger and have multiple folded domains which, during evolution, had become mixed and matched in many different combinations. So, it became a big activity to identify these domains. At EMBL, many of us – like Peer Bork – developed an interest in discovering novel domains and working on their structure/function. I partook in many enjoyable collaborations.
Around the millennium, I started a new bioinformatics database. People had slowly, but steadily been finding very short little bits of proteins that embodied biochemical functionality. Usually these were in the “linkers” between the folded domains in modular proteins. These came to be known as Short Linear Motifs (aka SLiMs). By the late 1990s, it was clear there were enough experimental examples that they were an intrinsic part of protein function, and we needed a bioinformatics resource dedicated to SLiMs. In 2001 – thanks to an EU infrastructure grant – we began to establish an ELM database, along with several key partners. We published the first version in 2003, but we’d only just begun. We now believe the number of SLiMs in the human proteome will be more than a million.
“Toby introduced me to bioinformatics and especially protein sequence analysis. He changed the course of my life, as I went on to build a career in the field, and Toby has remained an important mentor throughout.”
Once a database exists, it can be hard to continue to get funding to maintain it because funders always want to fund novelty. My next few grant attempts failed. I sulked. Once I got bored with sulking, I resolved that we would develop ELM as a ‘cottage industry’. From my perspective, something rather wonderful happened. Over time, we made more and more contacts with SLiM researchers worldwide – many of whom then annotated entries in the ELM database with their specialist knowledge. Their reward for several person-months of effort has been co-authorship of the ELM database update publication, but also being part of a larger network of SLiM collaborators around the world.
When I’m being very arrogant (which you sometimes have to be, for example, on grant applications), I say that ‘Toby Gibson ignited the bioinformatics of short linear motifs’.
Life at EMBL and the state of science
There’s more and more evidence that complexity in cell regulation and cell signalling has been badly underestimated. There have been a lot of oversimplified models, although it is getting better now.
Since I’m retiring, I won’t get to see the full flowering of powerful new technologies. When AlphaFold first came out, people were amazed at how good it was, but its predictions were based on known structures and huge bulk sequence alignments. But, if you give it a protein sequence where there isn’t a direct structure, it may be able to also work out what the structures are, yielding insight into what it might do. This was unexpected, and even I got very excited.
“It’s hard to over-emphasise the impact that Toby had on me. I came to his lab as a curious and naïve bioinformatician, and Toby taught me to ‘think biology’ in the very unique way that he saw it. I’ve been lucky to have so many inspirational mentors over the years but nobody formed my understanding of the functions of the cell to the degree Toby has. I still work on motifs, and I’m very aware that the current generation that Toby inspired now have the responsibility to push forward the motif biology field from the solid foundation he laid.”
It is definitely an exciting time intellectually. On the other hand, though, it’s perhaps not such a nice time, if you’re starting out – in how scientific enterprise is now structured. Short-termism and the obsession with what is ‘hot’ have been growing. And a paradox of grant funding is that hot, blue sky research is desired, but actually, the system selects very strongly to the norm.
In the previous SARS scare, you saw a lot of targeted funding, and a lot of new coronavirus results. But the funding wound down when the crisis ended. So, the insights paused for 15 years. Similarly, when people get scared of bacteria, funding becomes available. When they’re not scared of them, the funding gets directed to cancer, ageing, and so on. I’m not saying these things shouldn’t be funded, but you want to be able to try to stably fund things that are important, not just ‘hot’. Too much faddism controlling the allocation of research funds is damaging.
At EMBL, an open-door research culture became established, and it’s been here for decades now. It’s why I think EMBL is the most collaborative institute in the world.
I’ve always had a small group, and that has meant I had to collaborate horizontally. Last year for my retirement, my colleagues organised a meeting with more than 80 participants, confirming we’ve built up quite a worldwide community with these short protein motifs. This is a very precious horizontal network, which hopefully will live on. Indeed the SLiM field is now so established that my retirement will have a negligible effect on its scientific advancement.
“Always driven by his genuine curiosity, Toby had a marked effect on those around him. His scientific contributions are undeniable — a prototypical or consummate scientist. But what makes him stand out even further is how his kindness and concern for others at EMBL created a particular welcoming atmosphere. When you talked with Toby, you were talking with someone genuinely interested in science, eager to exchange his own thoughts on the subject while equally interested in the other person’s perspectives.”
EMBL has meant so many things. Core research funding, as EMBL provides, is incredibly important to follow up ideas that grant funders are not interested in funding. Most of all, it has shown – and continues to show – the value of interdisciplinary collaboration.
I’m going to keep nurturing my intellect by having an interest in the field that I like and interacting with EMBL and my other collaborators from my home office.
EMBL has played an enormous role in networking Europeans, training up young group leaders who then go back to the country of their original nationality. Even if they go elsewhere in Europe or elsewhere in the world, it doesn’t matter. It’s all part of this network of people who have learned the value of this special, open, interactive environment we have here.