M.P. Rocha et al. (Eds.): 6th International Conference on PACBB, AISC 154, pp. 79–86. springerlink.com © Springer-Verlag Berlin Heidelberg 2012 Case-Based Reasoning to Classify Endodontic Retreatments Livia Campo, Vicente Vera, Enrique Garcia, Juan F. De Paz, and Juan M. Corchado* Abstract. Within the field of odontology, an analysis of the probability of success of endodontic retreatment facilitates the diagnostic and decision-making process of medical personnel. This study presents a case-based reasoning system that pre- dicts the probability of success and failure of retreatments to avoid extraction. Dif- ferent classifiers were applied during the reuse phase of the case-based reasoning process. The system was tested on a set of patients who received retreatments, and a set of variables considered to be of particular interest, were selected. Keywords: case-based reasoning, classification, odontology. 1 Introduction Predictive systems in medicine are relevant for determining the probability of suc- cess or failure of specific treatments. Their use currently extends to include vari- ous fields such as the study of cancer, or other fields such as odontology[1][2][3]. The decisions made by odontologists have been traditionally based on past experi- ence of previous treatment cases, whereby experience in itself has been a neces- sary factor in the decision making process. There are normally too many variables to consider, which has in fact resulted in the high failure rate of retreatments. Livia Campo · Vicente Vera · Enrique Garcia Department of Estomatology II, Complutense University of Madrid Plaza Ramón y Cajal, s/n 28040, Madrid, Spain e-mail: lcampo@estumail.ucm.es, viventevera@odon.ucm.es, aegarcia@odon.ucm.es Juan F. De Paz · Juan M. Corchado Department of Computer Science and Automation, University of Salamanca Plaza de la Merced, s/n, 37008, Salamanca, Spain e-mail: {fcofds,corchado}@usal.es 80 L. Campo et al. Consequently, it has become necessary to create a system that facilitates the deci- sion making process of odontologists, and results in the decisions that minimize the failure of endodontic treatments and retreatments. Endodontics makes up 20% of all treatments performed in dental clinics and has a 90% rate of success. The remaining 10% includes endodontic treatments that were unsuccessful to a greater or lesser degree, of which 40% are the result of root crown fractures, which in turn represent 5% of all dental fractures. The bacterial recolonization of the root canal and the subsequent appearance of radiological symptoms represent 15% of endodontic failure [8] [9] [10]. Many different alter- native methods for analyzing data in odontology have already been investigated. The techniques applied in these fields are usually limited to the study of variables. A set of variables of interest is determined, followed by statistical tests and graphical representations of data to extract the relevant variables. Statistical analy- sis is limited to the application of specific tests such as chi square [12], Mann- Whitney [18] or Kruskal-Wallis [11]. These tests identify which variables present different characteristics in different groups; the value of the variables can subse- quently be taken into consideration for the final classification. Nevertheless, it is necessary to create a process that can combine all the information gathered in or- der to perform a final classification. Previous Works in the field of bioinformatics CBR (Case-Based Reasoning) systems have been successfully applied to predict leukemia. This study proposes a reasoning system to predict the success of re- treatments. A set of variables are recovered for a group of patients. This data set is used to generate a CBR system that incorporates different classification techniques during the reuse phase, in order to generate a classification for a new element. Traditional statistical techniques are applied during the revision phase to facilitate the interpretation of the results by selecting the variables that present different characteristics from those in the groups of individuals. This article is divided as follows: section two describes the multi-agent systems and planning mechanisms used for assigning dynamic tasks; section three presents the proposed model; section four describe the results obtained and the conclusions respectively. 2 Prediction System The use of predictive techniques in medicine and especially in the field of odon- tology has been studied since the late 80s, having primarily used the statistical analysis of clinical data. In 2001 Chungal N.M. published data related to a study of teeth extracted after unsuccessful endodontic treatments at the University of Connecticut School of Dental Medicine. The patients included in this study were treated between 1988 and 1992 in the graduate program and had experienced unsuccessful endodontic treatment within the previous four years. Variables were taken from both the clini- cal trial and x-rays taken at the time of the endodontic treatment. The data ob- tained in this case were studied with contingency tables and the chi-squared test. Case-Based Reasoning to Classify Endodontic Retreatments 81 The risk factors were compared using t-tests for independent groups, or with non- parametric tests (Mann-Whitney or Kruskal-Wallis) [1]. Using the same characteristics, Givol, N. published the results in 2001 of his study performed in patients from private clinics in Israel. In this case, all the pos- sible clinical variables prior and subsequent to the endodontic treatment were fa- thered from 5217 patients treated between 1992 and 2008. The data were also studied using statistical tests: chi-squared [2]. In July of 2011, Song, M. presented the data relative to a study performed on patients from the Department of Conservative Dentistry at the Dental College of Yonsei University, Seoul, Korea between August 2004 and December 2008. In- cluded in this study were patients who had undergone unsuccessful endodontic treatment and were in need of periapical surgery. Song considered clinical and x- ray data from prior to the treatment, demographic data, and data subsequent to the failed treatment. To analyze the factors that could predict the endodontic failure, he applied a chi-squared statistical study [3]. Of the previously cited studies, none used artificial intelligence or case base reasoning; nor did any use predictive tools other than the application of statistical studies to analyze risk factors. The use of this type of system offers, therefore, a wide area of study within the field of odontology and in particular with the predic- tion of unsuccessful endodontic treatments. 3 Proposed Reasoning System The purpose of CBR is to solve new problems by adapting solutions that have been used to solve similar problems in the past [4]. The primary concept when working with CBRs is the concept of case. A case can be defined as a past experi- ence, and is composed of three elements: a problem description which describes the initial problem, a solution which provides the sequence of actions carried out in order to solve the problem, and the final state which describes the state achieved once the solution was applied. A CBR manages cases (past experiences) to solve new problems. The way cases are managed is known as the CBR cycle, and consists of four sequential steps which are recalled every time a problem needs to be solved: retrieve, reuse, revise and retain. Each of the steps of the CBR life cycle requires a model or method in order to perform its mission. The algo- rithms selected for the retrieval of cases should be able to search the case base and select the problem and corresponding solution most similar to the new situation. Once the most important variables have been retrieved, the reuse phase begins, in which the solutions for the retrieved cases are adapted and a new solution is gen- erated. The revise phase consists of an expert revision for the proposed solution. Finally, the retain phase allows the system to learn from the experiences obtained in the three previous phases, consequently updating the cases memory. During the recovery phase, existing cases in which a retreatment was per- formed are selected from the case memory. This eliminates all cases that involve only an initial treatment. 82 L. Campo et al. During the reuse phase, previously retrieved cases are selected and an associ- ated classifier is built. In this case, the technique selected to carry out the classifi- cation phase corresponds to a Bayesian network. The new case is then introduced and classified according to the classifier built in this phase. The Bayesian networks are constructed by following the Friedman-Goldsmidtz [5] algorithm. Having two different classes, two Bayesian networks will be gener- ated, one for each of the classes. The TAN classifier is constructed based on the plans recovered that are most similar to the current plan, distinguishing between efficient and inefficient plans to generate the model (the tree). Thus, by applying the Friedman-Goldsmidtz [5] al- gorithm, the two classes that are considered are efficient and inefficient. The Friedman-Goldsmidtz algorithm makes it possible to calculate a Bayesian network based on the dependent relationships established through a metric. The metric considers the dependent relationships between the variables according to the clas- sifying variable. In this case, the classified variable is efficient and the remaining variables indicate whether a service is or is not available. The metric proposed by Friedman can be defined as:  ∈ ∈ ∈    ⋅ ⋅= Xx Yy Zz zyPzxP zyxP zyxPZYXI )|()|( )|,(log),,()|,( (1) Based on the previous metric, the probabilities are estimated according to the fre- quencies of the data. The Friedman-Goldsmidtz [23] algorithm is broken down into the following steps: • Calculate the value of )|,( CYXI for the different vari- ables/attributes X , Y that may be interconnected in the original graph, class C varies between the similar efficient and inefficient cases. • Construct a complete nondirected graph o Establish the different attributes/variables as nodes. o Within the connections, establish the values obtained in the first step as weights. For the arcs that do not have connec- tions, set the value as 0. • Create a maximum tree based on the Kruskal [6] algorithm. • Convert the nondirected tree into a directed tree. The initial connec- tion and the selection of the next node to connect will indicate the di- rection of the connections. • Finally, construct the TAN model by adding a node that represents class C and an arc that connects to C for each of the attributes. The revise phase includes statistical techniques to extract relevant variables during the classification process. There are a lot of variables therefore it is necessary and automatic method for extracting the relevant information for helping an expert during the reviewing process. The chi-square [12], the Yates correction tests [15], Case-Based Reasoning to Classify Endodontic Retreatments 83 the chi-square with the Monte Carlo simulation [17], and Fisher’s exact test [16] are applied to select the variables of interest that characterize the various patholo- gies. It is important to note that in order for the expected frequency to be less than 5, the result may be incorrect; consequently Yates correction would be applied in an attempt to mitigate this issue. The statistical results from chi-squared are also provided, applying the Monte Carlo simulation to verify the results. Finally, an exact Fisher test is applied, which is the recommended method when the sample size is small and it is not possible to ensure that 80% of the data from a contin- gency table have a value greater than 5. Medical studies such as [14] use a process similar to the one presented for selecting variables that affect malformations; other biomedical studies include [15] [16] [17]. There are many alternatives for correct- ing data, such as that in [13]. The Figure 1 shows the CBR cycle and the tech- niques for each step. Fig 1 ROC curves with the classification accuracy. 4 Results and Conclusions The selected cases were chosen from the patient files at the Faculty of Odontol- ogy, Masters of Endodontics, at the Complutense University of Madrid. All pa- tients received root canal treatments between September 2000 and May 2011. Among all the patients treated during this time, we selected 35 cases that satisfied the inclusion criteria and were interested in a follow up appointment. None of pa- tients from the selected cases who came for a follow up treatment refused to par- ticipate in the study. A total of 18 women and 17 men were selected whose ages ranged from 18 to 85 years. The average age of the patients was 54.6 years and they all satisfied the inclusion critera as previously established. The selected cases contained all the in- formation needed to complete the 72 variables being considered. These variables take into account all information relevant to the patient: medical and dental his- tory, habits. Data relative to the state of the tooth prior to treatment were also in- cluded: the evolution, the clinical technique used, and the post treatment results. Certain initial variables included a high number of categories, which resulted in 84 L. Campo et al. their recodification to ensure that the final number of categories per variable had around 3 or 4 different values. The stage that was most thoroughly analyzed dur- ing the study was the reuse stage. BayesNet, NaiveBayes, AdaBoostM1, Bagging, DecisionStump, J48, IBK, JRip, LMT, Logistic, LogitBoost, OneR, SMO and Stacking were analyzed. The following table shows the number of correct classifi- cations obtained for each of the methods applying the leave one out technique to the CBR system, since the number of cases was not very high and cross validation could not therefore be applied. The rate of correct classifications for the system was 89%. Table 1 Correct classifications Classifier Correct Classifier Correct BayesNet 88.57 JRip 60.00 NaiveBayes 82.86 LMT 65.71 AdaBoostM1 68.57 Logistic 80.00 Bagging 68.57 LogitBoost 68.57 DecisionStump 42.86 OneR 65.71 J48 77.14 SMO 77.14 IBK 82.86 Stacking 74.29 The precision of the Bayes Net increased to 0.89 and recall to 1. Precision and recall are defined as follows. )/( ppp fttprecision += )/( npp fttrecall += tp true positive, fp false positive, fn false negative. A graphical representation with ROC curves was made with the previous results. The ROC curves facilitate the analysis of different classifiers according to the area represented beneath the curve. The bigger the area, the better the classifier. The main advantage is the ability to distinguish the relevance of false negatives com- pared to false positives. In this case, a positive is understood as a successful re- treatment, given that the point is to avoid determining that an extraction is required if it were not actually necessary in the end. Figure 2 shows the ROC curve for each of the methods and the final result obtain. As shown, the result for the Bayesian network was satisfactory since the area beneath the curve is high and there are no false negatives (no extractions were predicted for successful cases). To facilitate the revise phase, a revision was made to determine the difference between the values of the variables for the categories of successful retreatments and extractions. To perform this analysis, the Chi square, Yates correction, chi square with Monte Carlo simulation, and the Fisher’s exact tests were applied. Ta- ble 2 displays the set of variables that were considered relevant by any of the three methods. We can see how the selection of variables coincides to a great degree for the different methods. Case-Based Reasoning to Classify Endodontic Retreatments 85 Fig 2 ROC curves with the classification accuracy Table 2 Relevant variables Variable P value Chi-Squared Exact Fisher Test Yates Monte Carlo Reason for treatment 0.007548049 0.004997501 0.008995502 Conde. Lateral or vertical 0.073673116 0.037481259 0.044117647 Clamps 0.052565842 0.052473763 0.033983008 Type of pain 0.038628647 0.033983008 0.034482759 Cause of fracture 0.00781095 0.008995502 0.002998501 Type of fracture 0.005016858 0.00049975 0.001470355 Location 0.022018021 0.013493253 0.018990505 Signs of fissure/fracture 0.008699709 0.005997001 0.004272424 Probing 0.005016858 0.001999 0.001470355 Visible fissure 0.022879307 0.009995002 0.013070078 Level 0.076170975 0.037481259 0.036300838 Other 0.027888372 0.044977511 0.015492254 Retreatment 0.000876579 0.00049975 0.000411132 False positive rate Tr u e po si tiv e ra te 0.0 0.2 0.4 0.6 0.8 1.0 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 BayesNet NaiveBayes AdaBoostM1 Bagging DecisionStump J48 JRip LMT Logistic LogitBoost OneR SMO Stacking 86 L. Campo et al. With the CBR analysis, the data obtained were relevant because by ordering the established variables, particularly those with the highest risk factor, we could pre- dict the final solution for treatment and retreatment in 89% of the cases without obtaining any false negatives. Furthermore, the system makes it possible to extract the relevant variables that can distinguish the different types of individuals. Never- theless, more cases are required to contrast the results with greater accuracy. Acknowledgments. This work has been supported by the MICINN TIN 2009-13839-C03-03. References 1. Chugal, N.M., Clive, J.M., Spangberg, L.S.: A prognostic model for assessment of the outcome of endodontic treatment: Effect of biologic and diagnostic variables. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 91(3), 342–352 (2001) 2. Givol, N., et al.: Risk management in endodontics. J. Endod. 36(6), 982–984 3. Song, M., et al.: Prognostic factors for clinical outcomes in endodontic microsurgery: a retrospective study. J. Endod. 37(7), 927–933 4. Kolodner, J.: Case-Based Reasoning. Morgan Kaufmann (1993) 5. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997) 6. Castro, J.L., Navarro, M., Sánchez, J.M., Zurita, J.M.: Loss and gain functions for CBR retrieval. Information Science 179(11), 1738–1750 (2009) 7. Joyanes, L., et al.: Knowledge Management. University of Paisley, Salamanca (2001) 8. Jurisica, I., Glasgow, J.: Applications of case-based reasoning in molecular biology. Artificial Intelligence Magazine 25(1), 85–95 (2004) 9. Canalda, C., Brau, E.: Endodoncia: técnicas clínicas y bases científicas, vol. 2. Mas- son, Barcelona (2006) 10. Casanellas, J.M.: Restauración del Diente Endodonciado, 1st edn. Pues, Madrid (2006) 11. Kruskal, W., Wallis, W.: Use of ranks in one-criterion variance analysis. Journal of American Statistics Association (1952) 12. Kenney, J.F., Keeping, E.S.: Mathematics of Statistics, Pt. 2, 2nd edn. Van Nostrand, Princeton (1951) 13. Martín Andrés, A., Silva Mato, A.: Optimal correction for continuity and conditions for validity in the unconditional chi-squared test. Computational Statistics & Data Analysis 26(1), 609–626 (1996) 14. Himmetoglu, O., Tiras, M.B., Gursoy, R., Karabacak, O., Sahin, I., Onan, A.: The in- cidence of congenital malformations in a Turkish population. International Journal of Gynecology & Obstetrics 55(2), 117–121 (1996) 15. Shaul, D.B., Scheer, B., Rokhsar, S., Jones, V.A., Chan, L.S., Boody, B.A., Ma- logolowkin, M.H., Mason, W.H.: Risk Factors for Early Infection of Central Venous Catheters in Pediatric Patients. Journal of the American College of Surgeons 186(6), 654–658 (1998) 16. Yang, X., Huang, Y., Crowson, M., Li, J., Maitland, M.L., Lussier, Y.A.: Kinase inhi- bition-related adverse events predicted from in vitro kinome and clinical trial data. 43(3), 376–384 (2010) 17. Nilsson, B.: A compression algorithm for pre-simulated MonteCarlop-value functions: Application to the ontological analysis of microarray studies. Pattern Recognition Let- ters 29(6), 768–772 (2008) 18. John, M., Priebe, C.E.: A data-adaptive methodology for finding an optimal weighted generalized Mann–Whitney–Wilcoxon statistic. Computational Statistics & Data Analysis 51(9), 4337–4353 (2007)