Lung Cancer Prediction Using Machine Learning: A Comparative Analysis Of Knn, Svm, Random Forest
Keywords:
Lung cancer, Machine learning, Random Forest, Logistic Regression, Hyperparameter tuningAbstract
Lung cancer is one of the leading causes of cancer-related deaths worldwide, and the likelihood of survival is greatly affected by how quickly and accurately the disease is detected. Although they work, traditional methods of diagnosing lung cancer often fail to identify the disease in its early stages. It is comforting to know that machine learning may improve lung cancer prognosis by sifting through complex patterns in medical data. The effectiveness of machine learning models is, however, dependent on the algorithms and optimisation techniques used. The purpose of this research is to examine and compare four machine learning methods—Random Forest, Logistic Regression, K-Nearest Neighbours (KNN), and Support Vector Machine—in order to forecast the occurrence of lung cancer. The Kaggle dataset was subjected to preprocessing, encoding, and feature selection processes in order to enhance model performance. The model parameters were fine-tuned using hyperparameter tuning in order to achieve an even higher level of accuracy. In order to assess the models, important performance metrics including as accuracy, precision, recall, and F1-score were used. While other models showed varying degrees of performance, the results show that the Logistic Regression technique performed best with a 90% accuracy rate. The results show that machine learning has potential for lung cancer prediction, and that model assortment and parameter optimisation are important. To improve predicted accuracy, future studies may investigate deep learning methods and use more patient data. In the end, using machine learning to diagnose lung cancer might result in earlier detection, better long-term effects, and a dramatic reduction in mortality rates.



