TF-IDF Algorithm For Weighting In Determining The Similarity Of Text In Documents
Keywords:
Text mining (Information retrieval), Term Frequency-Inverse Document Frequency (TF-IDF)Abstract
The grouping of research documents is needed to facilitate information retrieval. Sometimes we have to read one by one the contents of a document to be able to group it or know the existing information. This research attempts to help in finding information that exists in documents quickly. The information searching in documents by calculating the Term Frequency (TF) and Inverse Document Frequency (IDF) values on each token (word) in each document. The TF-IDF algorithm is an algorithm to calculate the weight of each word that is most commonly used in information retrieval. This algorithm is also known to be efficient, easy and accurate to get results. The accuracy of this algorithm in finding the information in a document reaches above 83,3%.
References
Abidin, Taufik Fuadi, Ridha Ferdhiana, and Hajjul Kamil. "Automatic extraction of place entities and sentences containing the date and number of victims of tropical disease incidence from the web." Journal of Emerging Technologies in Web Intelligence 5.3 (2013): 302-309.
Aouicha, Mohamed Ben, et al. "Experiments on element and document statistics for xml retrieval." International Conference on Data, Information and Knowledge Management. 2008.
Barakbah, Ali Ridho, and K. Arai. "A new algorithm for optimization of K-means clustering with determining maximum distance between centroids." Proc. Industrial Electronics Seminar (IES) 2006. 2006.
Fikry, M., Dinata, R. K (2016). Desain Web Dengan HTML dan CSS. Unimal Press.
ASPEK KUALITAS SCHEMA DATABASE." TECHSI-Jurnal Teknik Informatika 8.2 (2016).
Gil-Leiva, Isidoro. "SISA—Automatic Indexing System for Scientific Articles: Experiments with Location Heuristics Rules Versus TF-IDF Rules." Knowledge Organization 44.3 (2017): 139-162.
Hasibuan, Zainal A. "Step-Function Approach for ELearning Personalization." Telkomnika 15.3 (2017).
Ilgisonis, Ekaterina, et al. "Creation of Individual Scientific Concept-Centered Semantic Maps Based on Automated Text-Mining Analysis of PubMed." Advances in bioinformatics 2018 (2018).
Maarif, Abdul Azis. "Penerapan Algoritma TF-IDF Untuk Pencarian Karya Ilmiah." Teknik Informatika Universitas Dian Nuswantoro, Semarang (2015).
Munadi, Khairul. "INTERACTIVE INTERNET-BASED DISASTER RISK INFORMATION SYSTEM FOR TSUNAMI-HIT ACEH PROVINCE OF INDONESIA." Journal of Information & Communication Technology 15.1 (2016).
Savolainen, Reijo. "Pioneering models for information interaction in the context of information seeking and retrieval." Journal of Documentation (2018).
Wu, Ho Chung, et al. "Interpreting tf-idf term weights as making relevance decisions." ACM Transactions on Information Systems (TOIS) 26.3 (2008): 13.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Bustami
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright Notice
Authors published in this journal agree to the following terms:
1. The copyright of each article is retained by the author (s).
2. The author grants the journal the first publication rights with the work simultaneously licensed under the Creative Commons Attribution License, allowing others to share the work with an acknowledgment of authorship and the initial publication in this journal.
3. Authors may enter into separate additional contractual agreements for the non-exclusive distribution of published journal versions of the work (for example, posting them to institutional repositories or publishing them in a book), with acknowledgment of their initial publication in this journal.
4. Authors are permitted and encouraged to post their work online (For example in the Institutional Repository or on their website) before and during the submission process, as this can lead to productive exchanges, as well as earlier and larger citations of published work.
5. Articles and all related material published are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License.