Keywords and phrases: decision tree technique, diabetes detection, data preprocessing, SMOTE balancing, K-fold cross-validation.
Received: December 30, 2022; Revised: March 22, 2023; Accepted: April 19, 2023; Published: June 5, 2023
How to cite this article: SILUE Kolo, COULIBALY Mamadou, KABORE Raogo, ASSEU Olivier and BOURGET Daniel, Impact of data preprocessing and balancing on diabetes prediction using the decision tree technique, International Journal of Numerical Methods and Applications 23(2) (2023), 157-180. http://dx.doi.org/10.17654/0975045223008
This Open Access Article is Licensed under Creative Commons Attribution 4.0 International License
References:
[1] International Diabetes Federation, IDF Atlas 10th ed., 2021. [2] P. Z. Zimmet, D. J. Magliano, W. H. Herman and J. E. Shaw, Diabetes: a 21st century challenge, The Lancet Diabetes and Endocrinology 2(1) (2014), 56-64. doi: 10.1016/S2213-8587(13)70112-8. [3] M. A. Atkinson, G. S. Eisenbarth and A. W. Michels, Type 1 diabetes, The Lancet 383(9911) (2014), 69-82. doi: 10.1016/S0140-6736(13)60591-7. [4] S. Chatterjee, K. Khunti and M. J. Davies, Type 2 diabetes, The Lancet 389(10085) (2017), 2239-2251. doi: 10.1016/S0140-6736(17)30058-2. [5] Amar Abderrahmani, Mathie Tenenbaum, Amélie Bonnefond and Philippe Froguel, Physiopathology of diabetes, Scientific File, Revue Francophone des Laboratoires No. 502, May 2018. [6] H. D. McIntyre, P. Catalano, C. Zhang, G. Desoye, E. R. Mathiesen and P. Damm, Gestational diabetes mellitus, Nature Reviews Disease Primers 5(1) (2019), Article ID 47. doi: 10.1038/s41572-019-0098-8. [7] L. Bellamy, J. P. Casas, A. D. Hingorani and D. Williams, Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis, The Lancet 373(9677) (2009), 1773-1779. doi: 10.1016/S0140-6736(09)60731-5. [8] A. Ramachandran, Know the signs and symptoms of diabetes, Indian J. Med. Res. 140 (2014), 579-581. [9] Matti Uusitupa, Tauseef A. Khan, Effie Viguiliouk, Hana Kahleova, Angela A. Rivellese, Kjeld Hermansen, Andreas Pfeiffer, Anastasia Thanopoulou, Jordi Salas-Salvadó, Ursula Schwab and John L. Sievenpiper, Prevention of type 2 diabetes by lifestyle changes: a systematic review and meta-analysis, Nutrients 11(11) (2019), 2611. doi: 10.3390/nu11112611. [10] I. Kyrou, C. Tsigos, C. Mavrogianni, G. Cardon, V. V. Stappen, J. Latomme, J. Kivelä, K. Wikström, K. Tsochev, A. Nanasi, C. Semanova, R. Mateo-Gallego, I. Lamiquiz-Moneo, G. Dafoulas, P. Timpel, Peter E. H. Schwarz, V. Iotova, T. Tankova, K. Makrilakis and Y. Manios, Sociodemographic and lifestyle-related risk factors for identifying vulnerable groups for type 2 diabetes: a narrative review with emphasis on data from Europe, BMC Endocrine Disorders, Vol. 20, BioMed. Central Ltd., 2020. doi: 10.1186/s12902-019-0463-3 [11] IEEE Staff, IEEE/ACS International Conference on Computer Systems and Applications, 2008. [12] G. Swapna, R. Vinayakumar and K. P. Soman, Diabetes detection using deep learning algorithms, ICT Express, 4(4) (2018), 243-246. doi: 10.1016/j.icte.2018.10.005. [13] A. Singh, M. N. Halgamuge and R. Lakshmiganthan, Impact of different data types on classifier performance of RF, NB and KNN Algorithms, International Journal of Advanced Computer Science and Applications (IJACSA) 8 (2017). doi: 10.14569/issn.2156-5570. [14] Kumarmangal Roy, Muneer Ahmad, Kinza Waqar, Kirthanaah Priyaah, Jamal Nebhen, Sultan S. Alshamrani, Muhammad Ahsan Raza and Ihsan Ali, An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values, Complexity 2021 (2021), 1-21. doi: 10.1155/2021/9953314. [15] Z. Mushtaq, M. F. Ramzan, S. Ali, S. Baseer, A. Samad and M. Husnain, Voting classification-based diabetes mellitus prediction using hypertuned machine-learning techniques, Mobile Information Systems 2022 (2022), 1-16. doi: 10.1155/2022/6521532. [16] Sourav Kumar Bhoi, Sanjaya Kumar Panda, Kalyan Kumar Jena, P. Anshuman Abhisekh, Kshira Sagar Sahoo, Najm Us Sama, Shweta Supriya Pradhan and Rashmi Ranjan Sahoo, Prediction of diabetes in females of pima Indian heritage: a complete supervised learning approach, 2021. doi: https://doi.org/10.17762/turcomat.v12i10.4958. [17] E. Pekel Özmen and T. Özcan, Diagnosis of diabetes mellitus using artificial neural network and classification and regression tree optimized with genetic algorithm, J. Forecast. 39(4) (2020), 661-670. doi: 10.1002/for.2652. [18] S. Sivaranjani, S. Ananya, J. Aravinth and R. Karthika, Diabetes Prediction using machine learning algorithms with feature selection and dimensionality reduction, 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), 2021, pp. 141-146. doi: 10.1109/ICACCS51430.2021.9441935. [19] N. P. Tigga and S. Garg, Prediction of Type 2 diabetes using machine learning classification methods, Procedia Comput. Sci. 167 (2020), 706-716. doi: 10.1016/j.procs.2020.03.336. [20] E. Dritsas and M. Trigka, Data-driven machine-learning methods for diabetes risk prediction, Sensors 22(14) (2022), 5304. doi: 10.3390/s22145304. [21] F. Ridzuan and W. M. N. Wan Zainon, Diagnostic analysis for outlier detection in big data analytics, Procedia Comput. Sci. 197 (2022), 685-692. doi: 10.1016/j.procs.2021.12.189. [22] Viviane Planchon, Outlier treatment: current concepts and general trends, 2005. Accessed: Nov. 22, 2022. [Online]. Available: https://popups.uliege.be/1780-4507/index.php?id=13859. [23] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall and W. Phillip Kegelmeyer, SMOTE (Synthetic Minority Over-sampling Technique), Journal of Artificial Intelligence Research 16 (2002). [24] I. H. Witten, E. Frank, M. A. Hall and C. J. Pal, Credibility: evaluating what’s been learned, Data Mining: Practical Machine Learning Tools and Techniques, 2005, pp. 143-186. [25] I. H. Witten, E. Frank, L. E. Trigg, M. A. Hall, G. Holmes and S. J. Cunningham, Weka: practical machine learning tools and techniques with Java implementations, Working paper, 1999. [26] H. Benhar, A. Idri and J. L. Fernández-Alemán, Data preprocessing for decision making in medical informatics: potential and analysis, Advances in Intelligent Systems and Computing 746 (2018), 1208-1218. doi: 10.1007/978-3-319-77712-2_116. [27] P. Misra and A. S. Yadav, Impact of preprocessing methods on healthcare predictions, SSRN Electronic Journal (2019). doi: 10.2139/SSRN.3349586. [28] D. B. Rubin, Inference and missing data, Biometrika 63(3) (1976), 581-592. doi: 10.1093/biomet/63.3.581. [29] Md. Maniruzzaman, Md. J. Rahman, B. Ahammed and Md. M. Abedin, Classification and prediction of diabetes disease using machine learning paradigm, Health Inf. Sci. Syst. 8(1) (2020), 7. doi: 10.1007/s13755-019-0095-z. [30] J. J. Khanam and S. Y. Foo, A comparison of machine learning algorithms for diabetes prediction, ICT Express 7(4) (2021), 432-439. doi: 10.1016/j.icte.2021.02.004. [31] I. Gnanadass, Prediction of gestational diabetes by machine learning algorithms, IEEE Potentials 39(6) (2020), 32-37. doi: 10.1109/MPOT.2020.3015190. [32] S. Kolo, J. Grace, Y. Edwige, K. K. Hyacinthe, A. Olivier and B. Daniel, Predictive analysis of diabetes without data pre-processing via the evaluation of tree algorithms, Int. J. Adv. Res. (Indore) 10(12) (2022), pp. 1059-1069. doi: 10.21474/IJAR01/15940.
|