by Y. Qiang Sun, Pedram Hassanzadeh, Mohsen Zand, Ashesh Chattopadhyay, Jonathan Weare, and Dorian S. Abbot
Significance
AI models produce skillful weather forecasts, including for some extreme events. However, forecasting the strongest events that are so rare they did not exist in the training set (the so-called gray swans) remains a major concern for these models’ operational use, especially as climate change introduces unprecedented conditions. Here, we train an AI weather model after removing Category 3–5 tropical cyclones from its training set and test it on Category 5 storms. The model could not accurately forecast these unseen cyclones. However, the model shows promise in learning from strong storms in one region and forecasting them in another region. Our work highlights the need for better understanding the limitations of AI weather models and innovations to improve them.
Abstract
Predicting gray swan weather extremes, which are possible but so rare that they are absent from the training dataset, is a major concern for AI weather models and long-term climate emulators. An important open question is whether AI models can extrapolate from weaker weather events present in the training set to stronger, unseen weather extremes. To test this, we train independent versions of the AI weather model FourCastNet on the 1979–2015 ERA5 dataset with all data, or with Category 3–5 tropical cyclones (TCs) removed, either globally or only over the North Atlantic or Western Pacific basin. We then test these versions of FourCastNet on 2018–2023 Category 5 TCs (gray swans). All versions yield similar accuracy for global weather, but the one trained without Category 3–5 TCs cannot accurately forecast Category 5 TCs, indicating that these models cannot extrapolate from weaker storms. The versions trained without Category 3–5 TCs in one basin show some skill forecasting Category 5 TCs in that basin, suggesting that FourCastNet can generalize across tropical basins. This is encouraging and surprising because regional information is implicitly encoded in inputs. Given that current state-of-the-art AI weather and climate models have similar learning strategies, we expect our findings to apply to other models. Other types of weather extremes need to be similarly investigated. Our work demonstrates that novel learning strategies are needed for AI models to reliably provide early warning or estimated statistics for the rarest, most impactful TCs, and, possibly, other weather extremes.