Machine learning for the automated interpretation of mass spectrometry data

Mass spectrometry (MS) is an analytical technique that measures the mass-to-charge ratio of ions.

This technology is commonly used in the context of quality control and quality assurance in the pharmaceutical industry. The results are typically presented as a mass spectrum, a plot of intensity as a function of the mass-to-charge ratio. In this complex process, some outputs are corrupted or invalid. Currently validation of mass spectra is performed manually by experts, making this checkpoint a weakness in the perspective of quality assurance. An automatic classification of MS image outputs is desirable and feasible given recent advances in image classification algorithms.

We investigated the usability of support vector machines, k-nearest neighbors and deep neural network algorithms to accurately classify MS outputs into valid/invalid classes. We used a training data consists of a mixed sample of images, equally represented in the classes. For each algorithm a 10 fold cross-validation was applied to further reduce sampling bias. The algorithms classify samples at an average prediction accuracy of over 97%. The prediction accuracy is expected to further increase as more training data becomes available in the training set.