A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding

Montesinos-López, Osval A; Martín Vallejo, Francisco Javier; Crossa, José; Gianola, Daniel; Hernández-Suárez, Carlos M; Montesinos-López, Abelardo; Juliana, Philomin; Singh, Ravi

doi:10.1534/g3.118.200998

Título

A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding

Autor(es)

Montesinos-López, Osval A

Martín Vallejo, Francisco Javier

Crossa, José

Gianola, Daniel

Hernández-Suárez, Carlos M

Montesinos-López, Abelardo

Juliana, Philomin

Singh, Ravi

Palabras clave

Threshold

GBLUP

Deep learning

Support vector machine

Genomic selection

Plant breeding

Genomic Prediction

GenPred

Shared Data Resources

Fecha de publicación

2019

Editor

Genetics Society of America. Oxford University Press.

Citación

Osval A Montesinos-López, Javier Martín-Vallejo, José Crossa, Daniel Gianola, Carlos M Hernández-Suárez, Abelardo Montesinos-López, Philomin Juliana, Ravi Singh, A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding, G3 Genes|Genomes|Genetics, Volume 9, Issue 2, 1 February 2019, Pages 601–618, https://doi.org/10.1534/g3.118.200998

Resumen

[EN]Genomic selection is revolutionizing plant breeding. However, still lacking are better statistical models for ordinal phenotypes to improve the accuracy of the selection of candidate genotypes. For this reason, in this paper we explore the genomic based prediction performance of two popular machine learning methods: the Multi Layer Perceptron (MLP) and support vector machine (SVM) methods vs. the Bayesian threshold genomic best linear unbiased prediction (TGBLUP) model. We used the percentage of cases correctly classified (PCCC) as a metric to measure the prediction performance, and seven real data sets to evaluate the prediction accuracy, and found that the best predictions (in four out of the seven data sets) in terms of PCCC occurred under the TGLBUP model, while the worst occurred under the SVM method. Also, in general we found no statistical differences between using 1, 2 and 3 layers under the MLP models, which means that many times the conventional neuronal network model with only one layer is enough. However, although even that the TGBLUP model was better, we found that the predictions of MLP and SVM were very competitive with the advantage that the SVM was the most efficient in terms of the computational time required.

URI

https://hdl.handle.net/10366/160677

DOI

10.1534/g3.118.200998

Versión del editor

https://doi.org/10.1534/g3.118.200998

Aparece en las colecciones