Forecasting infectious diseases in Brazilian cities: Integrating socio-economic and geographic data from related cities through a machine learning approach

Sep 5, 2024·
Luiza Lober
Luiza Lober
Equal contribution
,
Francisco A. Rodrigues
Equal contribution
,
Kirstin O. Roster
· 0 min read
Visual representation of the feature engineering method presented in Section 2.3. (A) exemplifies how cities are selected according to spatial distance or similarities between time series representing the GDP or outbreak size evolution. (B) The prediction is performed by combining the time series of the selected cities. The forecasting on the target city is shown in red.
Abstract
Supervised machine learning models and public surveillance data has been employed for infectious disease forecasting in many settings. These models leverage various data sources capturing drivers of disease spread, such as climate conditions or human behavior. However, few models have incorporated the organizational structure of different geographic locations for forecasting. Traveling waves of seasonal outbreaks have been reported for dengue, influenza, and other infectious diseases, and many of the drivers of infectious disease dynamics may be shared across different cities, either due to their geographic or socioeconomic proximity. In this study, we developed a machine learning model to predict case counts of four infectious diseases across Brazilian cities one week ahead by incorporating information from related cities. We compared selecting related cities using both geographic distance and GDP per capita. Incorporating information from geographically proximate cities improved predictive performance for two of the four diseases, specifically COVID-19 and Zika. We also discuss the impact on forecasts in the presence of anomalous contagion patterns and the limitations of the proposed methodology.
Type
Publication
Chaos, Solitons and Fractals