Abstract Details
Abstract Title
Machine learning methods improve norovirus outbreak forecasting in the United States
Presenter
Sara Kim, Emory University
Co-Author(s)
Sara S Kim, Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA Wyatt Madden, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA Max SY Lau, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA Benjamin A Lopman, Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
Abstract Category
Epidemology
Abstract
Introduction: Norovirus outbreaks are highly variable and underreported, complicating accurate forecasting. Time series (TS) regression models can be constrained by incomplete knowledge, whereas machine learning (ML) offers more flexibility but may lack interpretability. We evaluated ML to forecast norovirus outbreaks in the United States, with the goal of improving disease control.

Methods: We trained models on the first 8-11 years of CDC’s National Outbreak Reporting System and forecasted the 7-day rolling outbreak average during 2018-2022. We used the mean squared error (MSE) to evaluate three methods: a traditional autoregressive TS model and two ML approaches – a regularized regression using the Least Absolute Shrinkage and Selection Operator (LASSO) and an ensemble model – for predicting outbreaks 1 to 28-days ahead. We used the ensemble as reference for the relative squared error (RSE).

Results: The ensemble performed best in 2018-2019, with LASSO-based and TS models performing similarly to each other. Though, TS and LASSO RSEs decreased over longer horizons (6.30 vs 6.26, respectively, for 1-day ahead; 1.29 vs. 1.19 for 28-days ahead). During pandemic-disrupted seasons, all methods struggled beyond 1-day forecasts, missing the early April 2020 decline by 2-3 weeks and forecasting the delayed 2020-2021 peak approximately 3 months early. The ensemble remained superior, but LASSO outperformed TS during these atypical seasons.

Conclusion: ML generally outperformed TS. The ensemble was most accurate, particularly short-term, but LASSO offered comparable performance over longer horizons, reduced reliance on predefined models, and greater interpretability. Improved norovirus forecasts can enhance public health alerts and proactive outbreak response.
Close