First results indicate that machine learning-based performance prediction in maize hybrid breeding programs can benefit from transcriptomic data
Computomics contributes to the BMBF-funded project MAZE Phase 2 together with academic and industry partners.
The projects tackles the challenge of making the diversity within maize landraces accessible to breeding programs, to overcome the genetic bottleneck of elite lines. At the same time, we aim to apply and improve machine learning-based prediction of maize hybrids.
With this subproject, one focus lies on the integration of transcriptome data during performance prediction.
The goal is to find ways of integrating omics data in machine learning-based performance prediction and to analyze which traits benefit most from the additional information.
To investigate this, the following dataset was kindly provided by Prof. Dr. Albrecht Melchinger:
Dry matter content
Root samples collected in 2 replicates in the seedling stage.
Expression data of all DH lines was mapped to B73 Nam 5.0, and only transcripts with a high information content were used for machine learning-based performance prediction. Various ways of including transcriptome data also in combination with genotyping data were tested. We compared the effect of transcript count data on the prediction accuracy to predictions based on genotyping data and combined datasets.
A 5-fold cross validation scheme was applied to test the prediction accuracy of machine learning models trained with the different combinations of -omics data.
Interestingly, first results indicate that especially in the case of complex polygenic traits like grain yield, the inclusion of transcriptome data can improve the prediction accuracy up to 20% (Pearson correlation coefficient). However, this effect highly depends on the way transcriptome data is integrated. Simpler traits, which are already predicted with a high accuracy based on genotypic data, don’t benefit from transcriptome data. In some cases, it can even be counterproductive.
These promising results, showing improved prediction accuracies in complex traits when transcriptome data is included, will be further improved and compared — for example, to predictions based on BLUP-based predictions.
The project is led by Prof. Dr. Chris-Carolin Schön from Technische Universität München.