Comparison of missing data handling methods for variant pathogenicity predictors

Blueprint Genetics / Resources / Publications / Comparison of missing data handling methods for variant pathogenicity predictors

Publications

Jan 09, 2026

Modern clinical genetic tests utilize next-generation sequencing (NGS) approaches to comprehensively analyze genetic variants from patients. Out of millions of variants, clinically relevant variants that match the patient’s phenotype must be identified accurately and rapidly. As manual evaluation is not a feasible option for meeting the speed and volume requirements of clinical genetic testing, automated solutions are needed. Various machine learning (ML), artificial intelligence (AI), and in silico variant pathogenicity predictors have been developed to solve this challenge. These solutions rely on comprehensive data and struggle with the sparse genetic annotations. Therefore, careful treatment of missing data is necessary, and the selected methods may have a huge impact on accuracy, reliability, speed, and associated computational costs.

Mikko Särkkä and co-authors presented an open-source framework called AMISS that can be used to evaluate performance of different methods for handling missing genetic variant data in the context of variant pathogenicity prediction. Using AMISS, they evaluated 14 methods for handling missing values. The performance of these methods varied substantially in terms of precision, computational costs, and other attributes.

The conclusion of this study was that it is unnecessary to use sophisticated missing data methods to treat missing values when building variant pathogenicity metapredictors. Instead, simple unconditional imputation methods and even zero imputation give higher performance and save significant computational time, leading to considerable cost savings if adopted. This highlights the conceptual separation between missing data methods for prediction and imputation for statistical inference, the latter of which requires carefully constructed techniques to reach correct conclusions.

Särkkä M, Myöhänen S, Marinov K, Saarinen I, Lahti L, Fortino V, Paananen J. Comparison of missing data handling methods for variant pathogenicity predictors. NAR Genomics and Bioinform. 2025;7(4):lqaf133. doi:10.1093/nargab/lqaf133

Last modified: January 09, 2026

Subscribe to our newsletter

Featured insights

Videos

Correction to: Diagnostic utility of next-generation sequencing-based panel testing in 543 patients with suspected skeletal dysplasia

Webinars

Tests

Panels

Services

Frequently ordered

Order

Pricing

Clinical Report

Learn More

Innovative Solutions

Transparent Quality

Accessibility

Blueprint Genetics

Accreditation

Newsroom

Meet Us

Resources

Education

Patients

Featured insights

Rare Journeys: Mom, Patient, Advocate

How to Solve Challenging WES Cases: Atypical Positive Results

Congenital Heart Disease: Key Considerations for Genetic Counseling

Beyond the Nucleus: Combining Panel-based NGS Testing with mtDNA Analysis

APOE in Focus: Considerations and Guidelines for Genetic Testing

Correction to: Diagnostic utility of next-generation sequencing-based panel testing in 543 patients with suspected skeletal dysplasia

Unraveling Spinal Muscular Atrophy: Expanding Phenotypes and Treatment

Heritable Disorders of Connective Tissue: Clinical Considerations for Genetic Testing

Nature’s Building Blocks: Decoding Disease

About us

Company

Learn more

News