Applicability of Data Mining and Predictive Analysis for Tobacco Cessation: An Exploratory Study

  • Kavita Rijhwani Mail Department of Public Health Dentistry, Maulana Azad Institute of Dental Sciences, Delhi, India
  • Vikrant R R Mohanty Department of Public Health Dentistry, Maulana Azad Institute of Dental Sciences, Delhi, India
  • Aswini YB Department of Public Health Dentistry, Maulana Azad Institute of Dental Sciences, Delhi, India
  • Vaibhav Singh Department of Computer Science, Rameshwaram Institute of Technology and Management, Lucknow (U.P), India
  • Sumbul Hashmi Department of Public Health Dentistry, Maulana Azad Institute of Dental Sciences, Delhi, India
Data Mining, Tobacco Use Cessation, Algorithms


Objectives: Predictive analysis can be used to evaluate the enormous data generated by the healthcare industry to extract information and establish relationships amongst the variables. It uses artificial intelligence to reveal associations not suspected by the healthcare professionals. Tobacco cessation is clearly beneficial; however, many tobacco users respond differently as it is based on multitude of factors.  Our objectives were to assess the data mining techniques using the WEKA tool, evaluate its role in predictive analysis, and to predict the quit status of patients using prediction algorithms in tobacco cessation. 
Materials and Methods: WEKA, a data mining tool, was used to classify the data and evaluate them using 10-fold cross-validations. The various algorithms used in this tool are Naïve Bayes, SMO, Random Forest, J-48, and Decision Stump to further analyze its role in determining the quit status of patients. For this, secondary data of 655 patients from a tobacco cessation clinic were utilized and described using 20 different attributes for prediction of quit status.
Results: The Decision Stump and SMO were found to be having the best prediction and accuracy for prediction of the quit status. Out of 20 attributes, previous quitting attempt, type of intervention, and number of years since the habit was initiated were found to be associated with early quitting rate.
Conclusion: This study concluded that data mining and predictive analytical models like WEKA tool will not only improve patient outcomes but identify variables or a combination of variables for effective interventions in tobacco cessation.


World Health Organization, Research for International Tobacco Control. WHO report on the global tobacco epidemic, 2008: the MPOWER package. World Health Organization; 2008 Feb 11.

Jha P, Jacob B, Gajalakshmi V, Gupta PC, Dhingra N, Kumar R, Sinha DN, Dikshit RP, Parida DK, Kamadod R, Boreham J. A nationally representative case–control study of smoking and death in India. N Engl J Med. 2008 Mar 13;358(11):1137-47.

Shimkhada R, Peabody JW. Tobacco control in India. Bulletin of the World Health Organization. 2003 Jan;81(1):48-52.

Thankappan KR. Tobacco cessation in India: A priority health intervention. Indian J. Med. Res. 2014 Apr;139(4):484.

Murthy P, Saddichha S. Tobacco cessation services in India: Recent developments and the need for expansion. Indian J. Cancer. 2010 Jul 1;47(5):69.

World Health Organization. WHO report on the global tobacco epidemic, 2013: enforcing bans on tobacco advertising, promotion and sponsorship. World Health Organization; 2013.

Kaur J, Jain DC. Tobacco control policies in India: implementation and challenges. Indian J Public Health. 2011 Jul 1;55(3):220.

Persai D, Panda R, Gupta A. Examining implementation of tobacco control policy at the district level: a case study analysis from a high burden state in India. Adv Prev Med. 2016 Jan 3;2016.

Hand DJ, Mannila H, Smyth P. Principles of data mining. MIT press; 2001. Retrievedfrom:

Koh HC, Tan G. Data mining applications in healthcare. J Healthc Inf Manag. 2011 Jan;19(2):65. doi:

Vijayarani S, Sudha S. Comparative analysis of classification function techniques for heart disease prediction. ‎IJIRCCE. 2013 May;1(3):735-41.

Joshi J, Doshi R, Patel J. Diagnosis and prognosis breast cancer using classification rules. Int. j. eng. res.gen. sci. 2014 Oct;2(6):315-23..Retreivedfrom .

Vijayarani S, Dhayanand S, Phil M. Kidney disease prediction using SVM and ANN algorithms. IJCBR. 2015;6(2).

Kumar MN. Alternating decision trees for early diagnosis of dengue fever. arXiv preprint arXiv:1305.7331. 2013 May 31.

Caponnetto P, Polosa R. Common predictors of smoking cessation in clinical practice. Respir Med. 2008 Aug 31;102(8):1182-92.


Kositbowornchai S, Siriteptawee S, Plermkamon S, Bureerat S, Chetchotsak D. An artificial neural network for detection of simulated dental caries. Int J Comput Assist Radiol Surg. 2006 Aug 1;1(2):91-6. doi:10.1007/s11548-006-0040-x.

Bahaa K, Noor G, Yousif Y. The Artificial Intelligence Approach for Diagnosis, Treatment and Modelling in Orthodontic. In Principles in Contemporary Orthodontics 2011. InTech..

Shankarapillai R, Mathur LK, Nair MA, Rai N, Mathur A. Periodontitis risk assessment using two artificial neural networks-a pilot study. Int. j. dent. clin. 2010 Dec 31;2(4).

Remco R. Bouckaert, Eibe Frank, Mark Hall, Richard Kirkby, Peter Reutemann, Alex Seewald and David Scuse. (2013). “WEKA Manual form Version 3-7-10”, 2013.

Solanki AV. Data mining techniques using WEKA classification for Sickle Cell Disease. Int. J. Inf. Technol. 2014;5(4):5857-60.

Yasodha P, Kannan M. Analysis of a population of diabetic patients databases in WEKA tool. J. sci. eng. res. 2011 May; 2(5):15.

Durairaj M, Ranjani V. Data mining applications in healthcare sector: a study. IJSTR. 2013 Oct 25;2(10):29-35.

Dhamodharan S. Liver Disease Prediction Using Bayesian Classification. In4th National Conference on Advanced Computing, Applications & Technologies 2014 May. doi:

Kopycka‐Kędzierawski DT, Billings RJ. Application of nonhomogenous Markov models for analyzing longitudinal caries risk. Community Dent Oral Epidemiol. 2006 Apr 1;34(2):123-9.

Bratthall D, Hänsel Petersson G. Cariogram–a multifactorial risk assessment model for a multifactorial disease. Community Dent Oral Epidemiol. 2005 Aug 1;33(4):256-64. doi: 10.1111/j.1600-0528.2005.00233.x

Sharma N, Om H. Data mining models for predicting oral cancer survivability. Netw Model Anal Health Inform Bioinform. 2013 Dec 1;2(4):285-95. doi:10.1007/s13721-013-0045-7.

Oberoi SS, Sharma G, Nagpal A, Oberoi A. Tobacco cessation in India: how can oral health professionals contribute. Asian Pac J Cancer Prev. 2014 Jan 1;15:2383-91.

How to Cite
Rijhwani K, Mohanty VR, YB A, Singh V, Hashmi S. Applicability of Data Mining and Predictive Analysis for Tobacco Cessation: An Exploratory Study. Front Dent. 17.
Original Article