Metode Random Forest untuk Klasifikasi Penyakit Diabetes
Abstract
Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age.
Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age.
References
Arfarisi, A. R., Tjandrasa, H., & Arieshanti, I. (2013). Perbandingan Performa antara Imputasi Metode Konvensional dan Imputasi dengan Algoritma Mutual Nearest Neighbor. JURNAL TEKNIK POMITS, 2(1), 1–4.
Benbelkacem, S., & Atmani, B. (2019). Random forests for diabetes diagnosis. 2019 International Conference on Computer and Information Sciences, ICCIS 2019, 1–4. https://doi.org/10.1109/ICCISci.2019.8716405
Breiman, L. (2001). Random Forest [University of California Berkeley]. https://doi.org/10.14569/ijacsa.2016.070603
Budianti, L., & Suliadi. (2022). Metode Weighted Random Forest dalam Klasifikasi Prediksi Kelangsungan Hidup Pasien Gagal Jantung. Bandung Conference Series: Statistics, 2(2), 103–110. https://doi.org/10.29313/bcss.v2i2.3318
Budiarti, A. (2006). Bab 2 landasan teori. Aplikasi Dan Analisis Literatur Fasilkom UI, Dm, 4–25.
Budiman, I., & Ramadina, R. (2015). Penerapan Fungsi Data Mining Klasifikasi untuk Prediksi Masa Studi Mahasiswa Tepat Waktu pada Sistem Informasi Akademik Perguruan Tinggi. Ijccs, x, No.x(1), 1–5.
Hendrawati, T. (2015). Kajian Metode Imputasi Dalam Menangani Missing Data. Prosiding Seminar Nasional Matematika Dan Pendidikan Matematika UMS, 637–642.
Mercadier, M., & Lardy, J. P. (2019). Credit spread approximation and improvement using random forest regression. European Journal of Operational Research, 277(1), 351–365. https://doi.org/10.1016/j.ejor.2019.02.005
Nahzat, S., & Yağanoğlu, M. (2021). Diabetes Prediction Using Machine Learning Classification Algorithms. European Journal of Science and Technology, 24, 53–59. https://doi.org/10.31590/ejosat.899716
Saxena, R., Sharma, S. K., Gupta, M., & Sampada, G. C. (2022). A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/3820360
Sihotang, H. T. (2017). Perancangan Aplikasi Sistem Pakar Diagnosa Diabetes Dengan Metode Bayes. Jurnal Manik Penusa, 1(1), 36–41.
Smith-Morris, C. M. (2004). Reducing Diabetes in Indian Country: Lessons from the Three Domains Influencing Pima Diabetes. Human Organization, 63(1), 34–46.
Subarkah, P. (2020). Penerapan Algoritma Klasifikasi Classification And Regression Trees ( CART ) untuk Diagnosis Penyakit Diabetes Retinopathy. 19(2), 294–301.
Suryanegara, G. A. B., Adiwijaya, & Purbolaksono, M. D. (2021). Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), 114–122. https://doi.org/10.29207/resti.v5i1.2880
Ting, K. M. (2017). Confusion Matrix. Encyclopedia of Machine Learning and Data Mining, October, 260–260. https://doi.org/10.1007/978-1-4899-7687-1_50