1 Department of Mathematics and Statistics, Austin Peay State University, Tennessee, USA.
2 Department of Healthcare Administration, University of the Potomac, Washington, USA.
International Journal of Science and Research Archive, 2025, 16(03), 219–234
Article DOI: 10.30574/ijsra.2025.16.3.2539
Received on 27 July 2025; revised on 01 September 2025; accepted on 04 September 2025
This study investigates the predictive capability of machine learning techniques in forecasting student academic performance using school-level and demographic data. A structured, publicly available dataset from the District of Columbia Public Schools was employed, comprising 1,163 records representing various student groups and institutional contexts. After preprocessing and feature selection, three regression models were developed and evaluated: a baseline Linear Regression model, Random Forest Regressor, and XGBoost Regressor. The baseline model demonstrated limited predictive strength (R² = .32, MAE = 13.79), while ensemble models significantly outperformed it. Random Forest achieved an R² of .69 and MAE of 7.74, capturing complex interactions more effectively. XGBoost slightly outperformed Random Forest with an R² of .70 and MAE of 7.19, showing stronger generalization and sensitivity to underrepresented groups. Feature importance analysis revealed that institutional factors such as Framework Points Earned strongly influenced predictions in Random Forest, whereas XGBoost emphasized subgroup characteristics, including Students with Disabilities, English Learners, and At-Risk populations. These findings highlight the strengths of ensemble methods in modeling non-linear and multidimensional educational data while raising questions about the trade-offs between model accuracy and equity. The study concludes that predictive models should be evaluated not only by statistical performance but also by their capacity to inform equitable interventions in education. Recommendations include the ethical deployment of predictive systems, incorporation of contextual data, and prioritization of fairness in model selection to support inclusive, data-informed educational policy and practice.
Academic Performance; Machine Learning; Student Group Score; Random Forest; Xgboost; Educational Data Mining; Predictive Modeling; Feature Importance; Fairness; Public School Analytics.
Preview Article PDF
Abdul-waliyyu Bello, Idris Ajibade, Idris Wonuola and Darlington Ekweli. Developing a predictive model for student academic performance using machine learning techniques. International Journal of Science and Research Archive, 2025, 16(03), 219–234. Article DOI: https://doi.org/10.30574/ijsra.2025.16.3.2539.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







