Machine learning algorithm brings long-read sequencing to the clinic
/ via embl/
SAVANA is a new tool designed for accurate detection of structural variations in clinical samples
Summary
- SAVANA uses a machine learning algorithm to identify cancer-specific structural variations and copy number aberrations in long-read DNA sequencing data.
- The complex structure of cancer genomes means that standard analysis tools give false-positive results, leading to erroneous clinical interpretations of tumour biology. SAVANA significantly reduces such errors.
- SAVANA offers rapid and reliable genomic analysis to better analyse clinical samples, thereby informing cancer diagnosis and therapeutic interventions.
Long-read sequencing technologies analyse long, continuous stretches of DNA. These methods have the potential to improve researchers’ ability to detect complex genetic alterations in cancer genomes. However, the complex structure of cancer genomes means that standard analysis tools, including existing methods specifically developed to analyse long-read sequencing data, often fall short, leading to false-positive results and unreliable interpretations of the data. These misleading results can compromise our understanding of how tumours evolve, respond to treatment, and ultimately how patients are diagnosed and treated.
To address this challenge, researchers developed SAVANA, a new algorithm, which they recently described in the journal Nature Methods. SAVANA uses machine learning to accurately identify structural variants – large genomic alterations such as insertions, deletions, duplications, or rearrangements – and the resulting copy number aberrations in cancer genomes – using long-read sequencing data.
It is important to have the right tool for the job. For example, you can eat soup with a fork, but the result is not as effective as using a spoon. SAVANA, like a spoon, is tailored for the task and designed to efficiently deliver reliable results.
This algorithm was developed and tested across 99 human tumour samples by researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the R&D laboratory of Genomics England, in collaboration with clinical partners at University College London (UCL), the Royal National Orthopaedic Hospital (RNOH), Instituto de Medicina Molecular João Lobo Antunes, and Boston Children’s Hospital.
“Because other analysis tools are not developed to account for the particularities of cancer genomics data, they often pick up false positives that could lead to incorrect clinical and biological interpretations,” said Isidro Cortes-Ciriano, Group Leader at EMBL-EBI. “SAVANA changes this. By training the algorithm directly on long-read sequencing data from cancer samples, we created a new method that can tell the difference between true cancer-related genomic alterations and sequencing artefacts, thereby enabling us to elucidate the mutational processes underlying cancer using long-read sequencing with unprecedented resolution.”
Optimised for clinical use
“When we developed SAVANA, our focus was clear: create a tool sophisticated enough to characterise complex cancer genomes but practical enough for clinical use,” explained Hillary Elrick, former Predoctoral Fellow at EMBL-EBI and Postdoctoral Fellow at the Francis Crick Institute.
“As a result, SAVANA can accurately distinguish somatic structural variants, copy number aberrations, tumour purity, and ploidy – all key to understanding tumour biology and guiding clinical treatment decisions,” added Carolin Sauer, Postdoctoral Fellow at EMBL-EBI.
Its rapid analysis and robust error correction make SAVANA well suited for clinical use. The method was recently applied to study osteosarcoma, a rare and aggressive bone cancer that mostly affects young people, where it helped researchers uncover new genomic rearrangements, providing novel insights into how osteosarcoma evolves and progresses. The team also compared SAVANA’s results from long-read data with Illumina sequencing of the same samples analysed using a whole-genome sequencing data analysis pipeline used to deliver clinical reports. The findings were highly consistent across technologies, demonstrating that SAVANA performs on par with current clinical standards while revealing additional cancer-relevant alterations.
“The capability to accurately detect structural variants is transformative for clinical diagnostics,” said Adrienne Flanagan, Professor at UCL, Consultant Histopathologist at RNOH. “SAVANA could help us confidently identify genomic alterations relevant for diagnosis and prognosis. Ultimately, this means we would be better placed to deliver personalised treatments for cancer patients.”
UK investment in clinical genomics
The UK is investing significantly in genomic sequencing technologies as part of the NHS Genomic Medicine Service. This initiative is the first in the world to offer whole genome sequencing as part of routine care. By embedding genomics into everyday clinical practice, it aims to improve diagnostic accuracy and support personalised cancer treatments.
However, investments in clinical genomics will only achieve their intended impact if genomic data are interpreted accurately, and this relies on specialised analytical tools. Genomics England explored SAVANA’s use as part of its work looking at the clinical potential of long-read sequencing technology to support earlier, faster diagnosis of cancer.
“Using SAVANA will ensure clinicians receive accurate and reliable genomic data, enabling them to confidently integrate advanced genomic sequencing methods such as long-read sequencing into routine patient care,” said Greg Elgar, Director of Sequencing R&D at Genomics England.
SAVANA is also being deployed as part of nationwide initiatives, such as the UK Stratified Medicine Paediatrics project funded by Cancer Research UK and Children With Cancer UK, and co-led by Cortes-Ciriano. This project is focused on developing more efficacious and less toxic treatments for childhood cancers using advanced sequencing technologies to better understand tumour biology and monitor disease recurrence.
Additionally, SAVANA is being used in Societal, Ancestry, Molecular and Biological Analyses of Inequalities (SAMBAI), a Cancer Grand Challenges funded project aimed at addressing cancer disparities in recent African heritage populations.
Funding
The researchers would like to thank their funders. These include EMBL, Wellcome Trust, Sarcoma Foundation of America, Connective Tissue Oncology Society, NF Research Initiative at Boston Children’s Hospital, National Institute for Health Research (NIHR), UCLH Biomedical Research Centre, UCL Experimental Cancer Centre, Pathological Society of Great Britain and Ireland, Jean Shanks Foundation, Tom Prince Cancer Trust, Rosetrees Trust, Skeletal Cancer Trust, Sarcoma UK, Bone Cancer Research Trust, Cancer Research UK, Medical Research Council, NHS England, and Wellcome Trust infrastructure funding. Access to patient data was provided through Genomics England’s National Genomic Research Library. Carolin Sauer from the Cortes-Ciriano group at EMBL-EBI also received a Marie Skłodowska-Curie fellowship to support this work.
Thank you
The scientists involved in this study would like to thank all of the patients and their families who donated samples used in this research.
Source article(s)
SAVANA: reliable analysis of somatic structural variants and copy number aberrations using long-read sequencing
Nature Methods 28 May 2025
https://doi.org/10.1038/s41592-025-02708-0