THE USE OF CLASSIFICATION ALGORITHMS TO ASSESS THE LIKELIHOOD OF HAVING A CHRONIC DISEASE
Background: Classification algorithms are used in a variety of medical domains for rule induction, prediction and classification. We present a comparative study of two classification algorithms (one that builds decision trees and one that is based on clustering), as well as logistic regression analysis for the task of classifying cardiac patients in a large database. Methods: As an application we studied a database that included coronary patients and matched by age and sex apparently healthy individuals, in order to explore the association between smoking habit and other medical conditions (such as hypertension, hypercholesterolemia and diabetes) with the risk of developing non-fatal acute coronary syndromes (ACS). Data analysis was based on logistic regression, as well as the two classification algorithms. Results: Results from logistic regression and classification algorithms showed that odds ratios converge between the classical approach (i.e., logistic regression) and the other two methods. Conclusion: Classification algorithms may prove particularly useful in the analysis of medical databases by providing a complete profile of a patient as a predictor of chronic diseases, like the cardiovascular.
classification, logistic regression, predictive models, epidemiology.