
Compartir
Título
File formats used in next generation sequencing: A literature review
Autor(es)
Palabras clave
Next-Generation sequencing
File format
Data sharing
Clasificación UNESCO
1203.17 Informática
2410.07 Genética Humana
Fecha de publicación
2022
Resumen
[EN]Next-generation sequencing (NGS) has revolutionized the field of genomics, allowing a detailed and precise look at DNA. As
this technology advanced, the need arose for standardized file formats to represent, analyze and store the vast data sets
produced. In this article, we review the key file formats used in NGS: FASTA, FASTQ, BED, GFF, and VCF.
The FASTA format, one of the oldest, provides a basic representation of genomic and protein sequences, identifiable by
unique headers. FASTQ is essential for NGS, as it stores both the sequence and the associated quality information. BED
provides a tabular representation of genomic loci, while GFF details the localization and structure of genomic features in
reference sequences. Finally, VCF has emerged as the predominant standard for documenting genetic variants, from simple
SNPs to complex structural variants.
The adoption and adaptation of these formats have been fundamental for progress in bioinformatics and genomics. They
provide a foundation on which to build sophisticated analyses, from gene discovery and function prediction to the
identification of disease-associated variants. With a clear understanding of these formats, researchers and practitioners are
better equipped to harness the power and potential of next-generation sequencing.
URI
Collections
- BISITE. Artículos [370]












