Machine Learning Algorithms Comparison for Gender Identification
DOI:
https://doi.org/10.29103/micoms.v4i.885Keywords:
classification models, data normalisation, gender identification, K-Nearest Neighbours, machine learningAbstract
Abstract. In this study, we presents a comprehensive analysis of gender identification methods utilising eight distinct classification models: K-Nearest Neighbors (KNN), Naive Bayes, Decision Tree, Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), and Neural Network. Gender identification is a critical task with significant applications in marketing, social analysis, and security systems, necessitating the exploration of various methodologies to achieve optimal performance. The dataset employed in this research underwent normalisation using the Min-Max scaling technique, which enhances the performance of classification models by ensuring that all features contribute equally, particularly when the data exhibits varying ranges of values. The results reveal that the K-Nearest Neighbors (KNN) model significantly outperformed the other models, achieving an impressive accuracy of 0.9758 with a support of 951, underscoring the effectiveness of the KNN algorithm in gender identification tasks and establishing it as a reliable choice for applications requiring high accuracy. Furthermore, the study emphasises the critical importance of selecting appropriate models in machine learning tasks and the substantial impact of data normalisation on model performance. Overall, this research provides valuable insights into the KNN algorithm, demonstrating its ease of implementation and exceptional effectiveness in achieving high precision in gender identification tasks, with implications for future research and practical applications across various fields.
Keywords : classification models; data normalisation; gender identification; K-Nearest Neighbours; machine learning.
References
[1] Bai, X., Wang, Y., & Liu, Z. (2019). "Gender Recognition Using Machine Learning Techniques and Its Applications." Journal of Machine Learning Applications, 15(3), 145-155.DOI: 10.1016/j.jmla.2019.03.004
[2] Zhang, H., Li, X., & Huang, J. (2020). "Improving Gender Classification Accuracy through Data Normalization and Feature Engineering." International Journal of Computer Vision and Machine Learning, 23(2), 78-90.DOI: 10.1109/IJCML.2020.456789
[3] Fikry, Muhammad, et al. "Improving Complex Nurse Care Activity Recognition Using Barometric Pressure Sensors." Human Activity and Behavior Analysis. CRC Press, 2024. 261-283.
[4] Fikry, Muhammad, Nattaya Mairittha, and Sozo Inoue. "Modelling Reminder System for Dementia by Reinforcement Learning." Sensor-and Video-Based Activity and Behavior Computing: Proceedings of 3rd International Conference on Activity and Behavior Computing (ABC 2021). Singapore: Springer Nature Singapore, 2022.
[5] Kumar, A., & Singh, S. (2020). "A Comparative Study of Machine Learning Algorithms for Gender Classification." International Journal of Computer Applications, 175(8), 25-30.
DOI: 10.5120/ijca2020175089
[6] Raju, V. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A., & Padma, V. (2020, August). Study the influence of normalization/transformation process on the accuracy of supervised classification. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 729-735). IEEE.
[7] Goswami, M., Mohanty, S., & Pattnaik, P. K. (2024). Optimization of machine learning models through quantization and data bit reduction in healthcare datasets. Franklin Open, 8, 100136.
[8] Lee, J. D., Lin, C. Y., & Huang, C. H. (2013, August). Novel features selection for gender classification. In 2013 IEEE International Conference on Mechatronics and Automation (pp. 785-790). IEEE.
[9] Abu Alfeilat, H. A., Hassanat, A. B., Lasassmeh, O., Tarawneh, A. S., Alhasanat, M. B., Eyal Salman, H. S., & Prasath, V. S. (2019). Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big data, 7(4), 221-248.
[10] Berrar, D. (2019). Bayes' theorem and naive Bayes classifier.
[11] Baykara, B. (2015). Impact of evaluation methods on decision tree accuracy (Master's thesis).
[12] Fratello, M., & Tagliaferri, R. (2018). Decision trees and random forests. Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics, 1(S 3).
[13] Fikry, Muhammad, and Sozo Inoue. "Optimizing Forecasted Activity Notifications with Reinforcement Learning." Sensors 23.14 (2023): 6510.
[14] Muller, C. J., & MacLehose, R. F. (2014). Estimating predicted probabilities from logistic regression: different methods correspond to different target populations. International journal of epidemiology, 43(3), 962-970.
[15] Dhaliwal, S. S., Nahid, A. A., & Abbas, R. (2018). Effective intrusion detection system using XGBoost. Information, 9(7), 149.
[16] Awad, M., Khanna, R., Awad, M., & Khanna, R. (2015). Support vector machines for classification. Efficient learning machines: Theories, concepts, and applications for engineers and system designers, 39-66.
[17] Sharma, S., Sharma, S., & Athaiya, A. (2017). Activation functions in neural networks. Towards Data Sci, 6(12), 310-316.
[18] Prasad, M., & Srikanth, T. (2024). Clustering Accuracy Improvement Using Modified Min-Max Normalization.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Aldo januansyah. H, Muhammad Fikry, Yesy Afrillia
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
COPYRIGHT NOTICE
Authors retain copyright and grant the journal right of first publication and this work is licensed under a Creative Commons Attribution-ShareAlike 4.0 that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
All articles in this journal may be disseminated by listing valid sources and the title of the article should not be omitted. The content of the article is liable to the author.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
In the dissemination of articles, the author must declare https://proceedings.unimal.ac.id/micoms/index as the first party to publish the article.