Statistical Analysis and Machine Learning Forecasting for the Fuel and Energy Balance of Kazakhstan (2019–2025)

Authors

  • Gabit Sekenov Astana IT University, School of Applied Data Analytics, Astana, Kazakhstan
  • Sabit Abzal L.N. Gumilyov Eurasian National University, Astana, Kazakhstan
  • Nurkhat Zhakiyev Astana IT University, School of Applied Data Analytics, Astana, Kazakhstan

Keywords:

fuel and energy balance, Kazakhstan, electricity deficit, LMDI decomposition, Bai–Perron structural break, SARIMA-LSTM, transfer learning, electricity demand forecasting, Power BI, ERA5-Land, power-sector CO₂ emissions

Abstract

Abstract

This paper presents a statistical and machine learning analysis of the fuel and energy balance of the Republic of Kazakhstan over 2019–2025. In 2023 Kazakhstan crossed from net electricity exporter to net importer, motivating a reproducible diagnosis of why the balance has shifted and a forecasting framework to anticipate further pressure. Eight publicly available datasets (Ember, KEGOC, Our World in Data, PyPSA-Earth, ERA5-Land, KOREM, IEA, national statistics) are integrated into a single SQLite database (18 tables, 34,362 rows) and exposed through an interactive Power BI dashboard. We decompose power-sector CO₂ emissions with the additive logarithmic mean Divisia index (LMDI), formally test the 2023 balance shift with the Bai–Perron multiple structural-break procedure, and build two complementary forecasting models—a hybrid SARIMA-LSTM for wind capacity factor and a transfer-learning gradient-boosting model pretrained on European load data for hourly Kazakhstan demand.

The coal share fell from 70.4% (2019) to 54.0% (2024) and gas rose from 18.6% to 29.3%, yet power-sector emissions increased from 89.38 to 95.05 Mt CO₂: LMDI attributes this to an activity effect of +9.8 Mt that outweighs the coal-to-gas structure effect of −4.2 Mt—a quantified "gas-substitution trap." The Bai–Perron test confirms 2023 as a statistically significant break in the annual balance (sup F = 14.7, p < 0.01): pre-break mean +2.1 TWh/yr, post-break −2.5 TWh/yr. The hybrid SARIMA-LSTM reduces wind capacity-factor MAE by 25.2% over the naive baseline (p < 0.001, Diebold–Mariano), and the transfer-learning demand forecaster reduces 24-hour-ahead sMAPE from 6.85% to 4.10% (40.1% improvement). All inputs, code, and figure scripts are released for reproducibility.

Published

2026-05-17

How to Cite

Gabit Sekenov, Sabit Abzal, & Nurkhat Zhakiyev. (2026). Statistical Analysis and Machine Learning Forecasting for the Fuel and Energy Balance of Kazakhstan (2019–2025). World Scientific Reports, (13). Retrieved from https://ojs.scipub.de/index.php/WSR/article/view/8710

Issue

Section

Technical Sciences