| dc.contributor.author | Canal-Alonso, Ángel | |
| dc.contributor.author | Jiménez, Pedro | |
| dc.contributor.author | Egido, Noelia | |
| dc.contributor.author | Prieto Tejedor, Javier | |
| dc.contributor.author | Corchado Rodríguez, Juan Manuel | |
| dc.date.accessioned | 2023-10-03T10:10:22Z | |
| dc.date.available | 2023-10-03T10:10:22Z | |
| dc.date.issued | 2022 | |
| dc.identifier.uri | http://hdl.handle.net/10366/153123 | |
| dc.description.abstract | [EN]Next-generation sequencing (NGS) has revolutionized the field of genomics, allowing a detailed and precise look at DNA. As
this technology advanced, the need arose for standardized file formats to represent, analyze and store the vast data sets
produced. In this article, we review the key file formats used in NGS: FASTA, FASTQ, BED, GFF, and VCF.
The FASTA format, one of the oldest, provides a basic representation of genomic and protein sequences, identifiable by
unique headers. FASTQ is essential for NGS, as it stores both the sequence and the associated quality information. BED
provides a tabular representation of genomic loci, while GFF details the localization and structure of genomic features in
reference sequences. Finally, VCF has emerged as the predominant standard for documenting genetic variants, from simple
SNPs to complex structural variants.
The adoption and adaptation of these formats have been fundamental for progress in bioinformatics and genomics. They
provide a foundation on which to build sophisticated analyses, from gene discovery and function prediction to the
identification of disease-associated variants. With a clear understanding of these formats, researchers and practitioners are
better equipped to harness the power and potential of next-generation sequencing. | es_ES |
| dc.description.sponsorship | This study has been funded by the AIR Genomics project
(with file number CCTT3/20/SA/0003), through the call 2020
R&D PROJECTS ORIENTED TO THE EXCELLENCE
AND COMPETITIVE IMPROVEMENT OF THE CCTT by
the Institute of Business Competitiveness of Castilla y León
and FEDER fund | es_ES |
| dc.language.iso | eng | es_ES |
| dc.subject | Next-Generation sequencing | es_ES |
| dc.subject | File format | es_ES |
| dc.subject | Data sharing | es_ES |
| dc.title | File formats used in next generation sequencing: A literature review | es_ES |
| dc.type | info:eu-repo/semantics/article | es_ES |
| dc.subject.unesco | 1203.17 Informática | es_ES |
| dc.subject.unesco | 2410.07 Genética Humana | es_ES |
| dc.relation.projectID | CCTT3/20/SA/0003 | es_ES |
| dc.rights.accessRights | info:eu-repo/semantics/openAccess | es_ES |
| dc.type.hasVersion | info:eu-repo/semantics/publishedVersion | es_ES |