---
_id: '7181'
abstract:
- lang: eng
  text: Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary
    predictions1,2, but the complexity of aligning large datasets requires the use
    of approximate solutions3, including the progressive algorithm4. Progressive MSA
    methods start by aligning the most similar sequences and subsequently incorporate
    the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy
    declines substantially as the number of sequences is scaled up5. We introduce
    a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard
    workstation and substantially improves accuracy on datasets larger than 10,000
    sequences. Our regressive algorithm works the other way around to the progressive
    algorithm and begins by aligning the most dissimilar sequences. It uses an efficient
    divide-and-conquer strategy to run third-party alignment methods in linear time,
    regardless of their original complexity. Our approach will enable analyses of
    extremely large genomic datasets such as the recently announced Earth BioGenome
    Project, which comprises 1.5 million eukaryotic genomes6.
article_processing_charge: No
article_type: original
author:
- first_name: Edgar
  full_name: Garriga, Edgar
  last_name: Garriga
- first_name: Paolo
  full_name: Di Tommaso, Paolo
  last_name: Di Tommaso
- first_name: Cedrik
  full_name: Magis, Cedrik
  last_name: Magis
- first_name: Ionas
  full_name: Erb, Ionas
  last_name: Erb
- first_name: Leila
  full_name: Mansouri, Leila
  last_name: Mansouri
- first_name: Athanasios
  full_name: Baltzis, Athanasios
  last_name: Baltzis
- first_name: Hafid
  full_name: Laayouni, Hafid
  last_name: Laayouni
- first_name: Fyodor
  full_name: Kondrashov, Fyodor
  id: 44FDEF62-F248-11E8-B48F-1D18A9856A87
  last_name: Kondrashov
  orcid: 0000-0001-8243-4694
- first_name: Evan
  full_name: Floden, Evan
  last_name: Floden
- first_name: Cedric
  full_name: Notredame, Cedric
  last_name: Notredame
citation:
  ama: Garriga E, Di Tommaso P, Magis C, et al. Large multiple sequence alignments
    with a root-to-leaf regressive method. <i>Nature Biotechnology</i>. 2019;37(12):1466-1470.
    doi:<a href="https://doi.org/10.1038/s41587-019-0333-6">10.1038/s41587-019-0333-6</a>
  apa: Garriga, E., Di Tommaso, P., Magis, C., Erb, I., Mansouri, L., Baltzis, A.,
    … Notredame, C. (2019). Large multiple sequence alignments with a root-to-leaf
    regressive method. <i>Nature Biotechnology</i>. Springer Nature. <a href="https://doi.org/10.1038/s41587-019-0333-6">https://doi.org/10.1038/s41587-019-0333-6</a>
  chicago: Garriga, Edgar, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Leila Mansouri,
    Athanasios Baltzis, Hafid Laayouni, Fyodor Kondrashov, Evan Floden, and Cedric
    Notredame. “Large Multiple Sequence Alignments with a Root-to-Leaf Regressive
    Method.” <i>Nature Biotechnology</i>. Springer Nature, 2019. <a href="https://doi.org/10.1038/s41587-019-0333-6">https://doi.org/10.1038/s41587-019-0333-6</a>.
  ieee: E. Garriga <i>et al.</i>, “Large multiple sequence alignments with a root-to-leaf
    regressive method,” <i>Nature Biotechnology</i>, vol. 37, no. 12. Springer Nature,
    pp. 1466–1470, 2019.
  ista: Garriga E, Di Tommaso P, Magis C, Erb I, Mansouri L, Baltzis A, Laayouni H,
    Kondrashov F, Floden E, Notredame C. 2019. Large multiple sequence alignments
    with a root-to-leaf regressive method. Nature Biotechnology. 37(12), 1466–1470.
  mla: Garriga, Edgar, et al. “Large Multiple Sequence Alignments with a Root-to-Leaf
    Regressive Method.” <i>Nature Biotechnology</i>, vol. 37, no. 12, Springer Nature,
    2019, pp. 1466–70, doi:<a href="https://doi.org/10.1038/s41587-019-0333-6">10.1038/s41587-019-0333-6</a>.
  short: E. Garriga, P. Di Tommaso, C. Magis, I. Erb, L. Mansouri, A. Baltzis, H.
    Laayouni, F. Kondrashov, E. Floden, C. Notredame, Nature Biotechnology 37 (2019)
    1466–1470.
date_created: 2019-12-15T23:00:43Z
date_published: 2019-12-01T00:00:00Z
date_updated: 2023-09-06T14:32:52Z
day: '01'
department:
- _id: FyKo
doi: 10.1038/s41587-019-0333-6
ec_funded: 1
external_id:
  isi:
  - '000500748900021'
  pmid:
  - '31792410'
intvolume: '        37'
isi: 1
issue: '12'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/
month: '12'
oa: 1
oa_version: Submitted Version
page: 1466-1470
pmid: 1
project:
- _id: 26580278-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '771209'
  name: Characterizing the fitness landscape on population and global scales
publication: Nature Biotechnology
publication_identifier:
  eissn:
  - '15461696'
  issn:
  - '10870156'
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
related_material:
  record:
  - id: '13059'
    relation: research_data
    status: public
scopus_import: '1'
status: public
title: Large multiple sequence alignments with a root-to-leaf regressive method
type: journal_article
user_id: c635000d-4b10-11ee-a964-aac5a93f6ac1
volume: 37
year: '2019'
...
