https://doi.org/10.65770/IJXV3497
ABSTRACT
Secondary education systems face the critical challenge of early student attrition and academic underperformance. While schools collect extensive demographic and behavioral data, its high dimensionality and redundancy obscure actionable insights for guidance counselors. This study addresses this gap by developing an early warning system using the UCI Student Performance dataset. We benchmarked three advanced dimensionality reduction techniques—Uniform Manifold Approximation and Projection (UMAP), Neighborhood Components Analysis (NCA), and Partial Least Squares Discriminant Analysis (PLS-DA)—integrated with diverse machine learning classifiers. Experimental results demonstrate that dimensionality reduction significantly enhances model performance by eliminating noise, with the NCA-XGBoost combination achieving optimal accuracy of 94.6% and recall of 94.1%. The analysis identified study time, alcohol consumption, and family relationship quality as the strongest behavioral predictors of academic success. This framework provides guidance counselors with a reliable clinical decision support system for proactive intervention, enabling targeted support based on specific behavioral triggers rather than intuition alone.
References
- [1] Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. Proceedings of the 5th Future Business Technology Conference (FUBUTEC 2008), Porto, Portugal, 5-12.
- [2] Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R. R. (2005). Neighbourhood components analysis. Advances in Neural Information Processing Systems, 17, 513-520.
- [3] Kesgin, H., et al. (2025). Fairness-aware binary classification for student performance prediction. Journal of Educational Computing Research, 62(1), 112-135.
- [4] McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. Journal of OpenSource Software, 3(29), 861. https://doi.org/10.21105/joss.00861
- [5] Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences, 10(3), https://doi.org/10.3390/app10031042
- [6] UCI Machine Learning Repository. (2008). Student Performance Data Set.
Download all article in PDF
![]()



