C. Lemaître, C.A. Reyes, and J.A. Gonzalez (Eds.): IBERAMIA 2004, LNAI 3315, pp. 83–92, 2004. © Springer-Verlag Berlin Heidelberg 2004 Applying Rough Sets Reduction Techniques to the Construction of a Fuzzy Rule Base for Case Based Reasoning Florentino Fdez-Riverola1, Fernando Díaz1, and Juan M. Corchado2 1 Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain {riverola, fdiaz}@uvigo.es 2 Dept. de Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain corchado@usal.es Abstract. Early work on Case Based Reasoning reported in the literature shows the importance of soft computing techniques applied to different stages of the classical 4-step CBR life cycle. This paper proposes a reduction technique based on Rough Sets theory that is able to minimize the case base by analyzing the contribution of each feature. Inspired by the application of the minimum de- scription length principle, the method uses the granularity of the original data to compute the relevance of each attribute. The rough feature weighting and selec- tion method is applied as a pre-processing step previous to the generation of a fuzzy rule base that can be employed in the revision phase of a CBR system. Experiments using real oceanographic data show that the proposed reduction method maintains the accuracy of the employed fuzzy rules, while reducing the computational effort needed in its generation and increasing the explanatory strength of the fuzzy rules. 1 Introduction and Motivation Case-Based Reasoning (CBR) systems solve problems by reusing the solutions to similar problems stored as cases in a case base [1] (also known as memory). How- ever, these systems are sensitive to the cases present in the case memory and often its accuracy depend on the significance of these stored cases. Therefore, in CBR systems it is always important to reduce noisy cases in order to achieve a good generalisation accuracy. Case Based Reasoning is an application area where soft computing tools have had a significant impact during the past decade [2]. These soft computing techniques (fuzzy logic, artificial neural networks, genetic algorithms and rough sets, mainly) work in parallel for enhancing the problem-solving ability of each other [3]. In this paper we propose a reduction technique based on Rough Set theory applied to the revision stage of CBR systems. The proposed method was introduced into our Case-Based forecasting platform called FSfRT, specialized in making predictions for 84 F. Fdez-Riverola, F. Díaz, and J.M. Corchado changing environments. Presently, the FSfRT platform is able to combine artificial neural networks and fuzzy logic in the framework of a CBR system. The paper is structured as follows: section 2 introduces the Rough Set theory grounding; section 3 details the proposed Rough Set reduction technique; section 4 describes the Case-Based Reasoning platform used in this study; section 5 exposes the test bed of the experiments and the results obtained; finally, section 6 presents the conclusions and further work. 2 Rough Set Theory Rough Set theory, proposed by Pawlak [4,5], is an attempt to provide a formal frame- work for the automated transformation of data into knowledge. It is based on the idea that any inexact concept (for example, a class label) can be approximated from below and from above using an indiscernibility relationship. Pawlak [6] points out that one of the most important and fundamental notions to the Rough Set philosophy is the need to discover redundancy and dependencies between features. The main advantages of Rough Set theory are that it: (i) provides efficient algo- rithms for discovering hidden patterns in data; (ii) identifies relationships that would not be found while using statistical methods; (iii) allows the use of both qualitative and quantitative data; (iv) finds the minimal sets of data that can be used for classifica- tion tasks; (v) evaluates the significance of data and (vi) generates sets of decision rules from data. 2.1 Basic Concepts and Definition Briefly, the relevant Rough Set terminology is stated below. An information system is a pair S = 〈U, A〉, where U is a non-empty and finite set, called the universe, and A is a non-empty, finite set of attributes (or features). An equivalence relation, referred to as indiscernibility relation, is associated with every subset of attributes P ⊆ A. This rela- tion is defined as: )}()(every for :),{()( yaxP, aaUUyxPIND =∈×∈= . (1) Given any subset of features P, any concept X ⊆ U can be defined approximately by the employment of two sets, called lower and upper approximations. The lower approximation, denoted by PX , is the set of objects in U which can be certainty classi- fied as elements in the concept X using the set of attributes P, and is defined as fol- lows: }:)(/{ XYPINDUYXP ⊆∈∪= . (2) The upper approximation, denoted by XP , is the set of elements in U that can be possibly classified as elements in X, formally: }:)(/{ ∅≠∩∈∪= XYPINDUYXP . (3) Applying Rough Sets Reduction Techniques 85 The degree of dependency of a set of features P on a set of features R is denoted by γR(P), 0 ≤ γR(P) ≤ 1, and is defined as: )( ))(()( Ucard PPOScardP RR =γ . (4) where U )(/ )( PINDUX R XRPPOS ∈ = . (5) POSR(P) contains the objects of U which can be classified as belonging to one of the equivalence classes of IND(P), using only features from the set R. If γR(P) = 1, then R functionally determines P. Various extensions have been defined from the basic model proposed by Pawlak. Among these extensions stands out the Variable Precision Rough Set model (VPRS) [7] which is a generalisation that introduces a controlled degree of uncertainty within its formalism. This degree is established by an additional parameter φ. 2.2 Rough Sets as Reduction Technique A major feature of the Rough Set theory is to find the minimal sets of data that can be used for classification tasks. In this sense, the notions of core and reduct of knowledge are fundamental for reducing knowledge preserving information. After stating the formal definitions of these concepts, it is outlined the reduction process proposed by the methodology. P is an independent set of features if there does not exist a strict subset P' of P such that IND(P) = IND(P'). A set R ⊆ P is a reduct of P if it is independent and IND(R) = IND(P). Each reduct has the property that a feature can not be removed from it with- out changing the indiscernibility relation. Many reducts for a given set of features P may exists. The set of attributes belonging to the intersection of all reducts of P is called the core of P: I )(Re )( PductR RPcore ∈ = . (6) An attribute a ∈ P is indispensable if IND(P) ≠ IND(P \ {a}). The core of P is the union of all the indispensable features in P. The reduction technique stated by the methodology is specially suitable for re- ducing decision tables. A decision table is an information system of the form S = 〈U, A ∪ {d}〉, where d ∉ A is a distinguished attribute called the decision attribute or class attribute. The elements of the set A are referred to as condition attributes. A decision table is a classifier that has as its internal structure a table of labelled in- stances. Given a novel instance, the classification process is based on the search of all matching in-stances in the table. If no matching instances are found, unknown is 86 F. Fdez-Riverola, F. Díaz, and J.M. Corchado returned; otherwise, the majority class of the matching instances is returned (there may be multiple matching instances with conflicting labels). The indispensable attributes, reducts, and core can be similarly defined relative to a decision attribute or output feature. The precise definitions of these concepts can be fount in Pawlak's book on Rough Sets [5]. At this point, it is very important to use the classification rules (given by a decision table) with the minimal effort, and therefore, the simplification of decision tables is of primary importance. The simplification process comprises two fundamental tasks. On the one hand, reduction of attributes consists of removing redundant or irrelevant attributes, without losing any essential classification information. The computation of the reducts for the condition attributes relative to the decision attribute is carried out to achieve this goal. On the other hand, reduction of attribute values is related to the elimination of the greatest number of condition attribute values, maintaining also the classificatory power. 3 Feature Subset Selection Using Rough Sets The computation of the reducts and the core of the condition attributes from a deci- sion table is a way of selecting relevant features. It is a global method in the sense that the resultant reduct represents the minimal set of features which are necessary to maintain the same classificatory power given by the original and complete set of attributes. A straighter manner for selecting relevant features is to assign a measure of relevance to each attribute and choose the attributes with higher values. In the Rough Set framework, the natural way to measure the prediction success is the degree of dependency defined above. However, [8] have shown the weakness of this measure in order to assess an estimation of the predictive accuracy of a set of condition attributes Q with regard to a class attribute d. To overcome this deficien- cies, [9] define the notion of rough entropy. Based on this notion and its adaptation to the VPRS model (in order to exploit more efficiently the knowledge that is pro- vided for the observations in the boundary region or the uncertain area of the uni- verse), we have defined a coefficient that allows to asses the significance of an attribute within a set of attributes [10]. The significance of an attribute a ∈ Q is defined in a way that its value is greater when the removal of this attribute leads to a greater diminution of the complexity of the hypothesis Q \ {a}, and simultane- ously, to a lesser loss of accuracy of the hypothesis. Implicitly, the underlying prin- ciple used to evaluate the relevance of an attribute in this way is the Minimum De- scription Length Principle (MDLP) [11]. The associated complexity of a given set of condition attributes Q can be evalu- ated through the entropy of the partition U / IND(Q), which will be denoted by H(Q). On the other hand, the conditional rough entropy Hφ (d | Q) can be used to evaluate the accuracy that is achieved when the condition attributes Q are used to predict the value of the condition attribute d. Therefore, the formal definition of the φ-rough entropy, denoted by RHφ (d | Q), is given by the following expression: Applying Rough Sets Reduction Techniques 87 U X U X Ud U X U X Ud QHQdHQHdQRH i dx i Q i dx i Q i i 2 )(POS 2, 2 )(POS 2, loglog)}(1{ loglog)}(1{ )()|()(),( Q, Q, ∑ ∑ ⊆ ⊆ −−= ⎪⎭ ⎪⎬ ⎫ ⎪⎩ ⎪⎨ ⎧ +− +=+= φ φ φ φ φφ γ γ . (7) where Xi represents each one of the classes of the partition U / IND(Q), the set POSQ,φ (d) is the positive region of Q with regard to the decision attribute d, and γQ,φ (d) is the degree of dependence of attribute d on the set of attributes Q. Then, the φ-significance of a condition attribute, a ∈ Q, with regard to the decision attribute d, denoted by σa,φ (Q, d), is defined as the variation that the φ-rough entropy suffers when the considered attribute is dismissed from Q. Namely, it is computed the term ∆aRHφ (Q, d), given by the difference between RHφ (Q, d) and RHφ (Q \ {a}, d). Formally, )}|(}){\|({})}{\()({ )},{\(),(),(),( , QdHaQdHaQHQH daQRHdQRHdQRHdQ aa φφ φφφφσ −−−= −=∆= . (8) Fig. 1 provides a concise description of the algorithm that selects a subset of rele- vant features using the significant φ-rough coefficient to evaluate the relevance of a feature. The proposed algorithm for selecting relevant features is described according to the view proposed by [12]. These authors state that a convenient paradigm for viewing feature selection methods is that of heuristic search, with each state in the search space specifying a subset of the possible features. Following Blum and Langley viewpoint the four basic issues that characterise this method are: − The starting point in the space, which in turn influences the direction of search and the operators used to generate successor states. The proposed algorithm starts with all attributes and successively removes them (lines 1 and 15, respectively). This approach is known as backward elimination. − The organisation of the search. Any realistic approach relies on a greedy method to traverse the space considering that an exhaustive search is impractical. At each point in the search, the proposed algorithm considers all local changes, namely, it evaluates the significance of each attribute of the current set of attributes (loop for). − The strategy used to evaluate alternative subsets of attributes. In this paper, the variation of the normalised φ-rough entropy has been chosen for this purpose. Spe- cifically, at each decision point the next selected state is that one which results from removing the attribute with the least significant φ-rough coefficient (line 10). − A criterion for halting the search. In the algorithm, the criterion for halting is that the difference between the degree of dependency at initial state and the current one (both with respect to the decision) do not exceed a predefined threshold (line 14). 88 F. Fdez-Riverola, F. Díaz, and J.M. Corchado Fig. 1. Algorithm for feature subset selection 4 Description of the FSfRT Platform The study described in this paper was carried out in the context of the FSfRT plat- form. FSfRT is a structured hybrid system that can employ several soft computing techniques in order to accomplish the 4-steps of the classical CBR life cycle [1]. This section covers two main points: (i) details the architecture of the FSfRT platform and (ii) introduces the use of Rough Sets inside the whole system. 4.1 FSfRT Platform Architecture The FSfRT platform is an extension of a previous successful system [13] able to make predictions of red tides (discolourations caused by dense concentrations of micro- scopic sea plants, known as phytoplankton). The FSfRT platform allows us to com- bine several soft computing techniques in order to test their suitability working to- gether to solve complex problems. The core and the interfaces of FSfRT have been coded in Java language and new capabilities are being developed. The general idea is to have different programmed techniques able to work separately and independently in co-operation with the rest. The main goal is to obtain a general structure that could change dynamically depending on the type of problem. Fig. 2 shows a schematic view of the system. On the left of Fig. 2, it is shown the core of the platform that is composed by a KAM (Knowledge Acquisition Module). The KAM is able to store all the information needed by the different techniques employed in the construction of a final CBR sys- tem. In the retrieve and reuse stages, several soft computing techniques can be used [2,3], while in the revise stage, our platform employs a set of TSK fuzzy systems [14] in order to perform the validation of the initial solution proposed by the system. Applying Rough Sets Reduction Techniques 89 Fig. 2. FSfRT platform architecture Our aim in this work, is to perform a feature subset selection step before the induc- tion process carried out by the fuzzy revision method. The aim of this proposal is to reduce the original set of attributes and therefore decrease the computational effort for the generation of the different fuzzy models. 4.2 Rough Sets Inside the FSfRT Platform Fig. 3 shows the meta-level process when incorporating the Rough Sets as a pre- processing step before the generation of the fuzzy revision subsystem. Fig. 3. Rough Set pre-processing step For details related to the construction of the fuzzy systems starting from a Radial Basis Function neural network see [14]. The Rough Set process described here 90 F. Fdez-Riverola, F. Díaz, and J.M. Corchado involves the first step in the generation of the initial fuzzy system and it is divided into three phases: The first one discretises the cases stored in the case base. It is necessary in order to find the most relevant information using the Rough Set theory. The second one uses the significant φ-rough coefficient to select a subset of relevant features as described in section 3 (see Fig. 1). Finally, the last phase searches for reducts and core of knowledge from the features selected in the previous phase, as explained in section 2. The motivation of including the second phase is that the computation of reducts is a blind technique, where several combinations of a sufficient number of irrelevant fea- tures can become a reduct. The pre-selection of features leads to reducts with a lesser complexity and a higher predictive accuracy. 5 Empirical Study In order to evaluate the proposed method, we use a biological database composed by several physical variables (temperature, PH, oxygen, salinity, etc.) measured at dis- tinct depths and belonging to different monitoring points of the north west coast of the Iberian Peninsula. These data values are complemented with data derived from satel- lite images stored separately. The satellite image data values are used to generate cloud and superficial temperature indexes. The whole memory of the system consists on approximately 6300 cases, each one represented as a feature vector that holds 56 attributes. The FSfRT platform was configured to use the same techniques as in our previous work [14], where the fuzzy revision method was successfully tested: (i) a Growing Cell Structure (GCS) neural network as retrieval method, (ii) a Radial Basis Function (RBF) neural network for the reuse step and (iii) the aforementioned set of TSK fuzzy systems working as the revision mechanism. Specific information about these tech- niques and its integration inside the CBR life cycle can be found in [15]. The main goal of the previous work was to develop a forecasting biological system capable of predicting the concentration of diatoms (a type of single-celled algae) in different water masses. Although the experiments carried out in [14] showed the effectiveness and the straightforward improvement of the proposed fuzzy revision method over other ap- proaches, some issues remained unsolved in order to deploy the application for real use. Concisely, the main drawbacks of the tested method were: (i) the time needed for generating each one of the TSK fuzzy models and (ii) the explanatory complexity of the fuzzy rules used for the final solution proposed by the system. In order to solve these problems maintaining at the same time the accuracy level, we have proposed in this paper a feature subset selection algorithm based on Rough Set theory. As we can see in Table 1, several φ values have been tested in order to obtain the most accurate set of representative features defining each problem case. For the current domain of diatoms forecasting, the optimal number of features was 12 (φ = 0.01), corresponding to the physical magnitudes measured with a smaller level of depth and those generated from satellite images. A crucial aspect in this experiment is the accuracy level of the Rough Set based revision subsystem and its comparison with the original one. Starting from the error Applying Rough Sets Reduction Techniques 91 Table 1. Selected features depending on the value of the parameter φ parameterφ number of selected features 0.0 16 0.01 12 0.025 10 0.05 8 0.1 8 series generated by the different models, the Kruskall-Wallis test has been carried out. Since the P-value is less than 0.01, there is a statistically significant difference among the models at the 99.0% confidence level. Fig. 4 shows a multiple comparison proce- dure (Mann-Withney) used to determine which models are significantly different from the others. The experiments were made with a data set of 448 cases randomly taken from the case base. It can be seen that the CBR with TSK fuzzy revision subsystem (CBR TSK) presents statistically significant differences with the rest of the models, whilst it is as accurate as the simplified method presented here (CBR φ (TSK)). Fig. 4. Mann-Withney test carried out between each pair of models The time consumed in the execution of the pre-processing step plus the whole gen- eration of the TSK fuzzy systems (2 hours aprox. in a Pentium IV processor) was the 80% less than the amount needed for generating the original fuzzy revision subsystem. This timesaving operation is motivated by the simplified fuzzy rule base employed by the greedy algorithm used to generate each one of the TSK fuzzy systems. Another relevant circumstance derived from the adoption of the proposed schema was the increment in the explanatory strength of the justification generated by the final CBR system. Initially the feature vector describing a problem was composed of 56 attributes, the same as the fuzzy rule antecedents, now the system is able to produce an explanation based on only 12 main features with the same level of accuracy. 6 Conclusion and Further Work This paper introduces a new reduction technique based on Rough Set theory that can be applied for improving a previous successful method that automates the revision stage of CBR systems. 92 F. Fdez-Riverola, F. Díaz, and J.M. Corchado Empirical studies show that this reduction technique allow us to obtain a more gen- eral knowledge of the model and gain a deeper insight into the logical structure of the system to be approximated. Employing the simplified fuzzy rule base as the starting point to generate the fuzzy revision subsystem proposed in [14], leads to a dramatic decrease of the time needed for this task while maintaining an equivalent generalisa- tion accuracy. These benefits are augmented with the simplicity of the new fuzzy rules used by the CBR system as explanation of the final adopted solution. In this way, it is interesting the definition of a formal measure in order to rate and compare the explanation strength of these fuzzy rules. Due to the suitability showed by the Rough Set theory working together with other soft computing techniques, we are also interested in the development of new ways to put together this formalism with the existing techniques coded in the FSfRT platform. References 1. Riesbeck, C.K., Schank, R.C.: Inside Case-Based Reasoning, Lawrence Erlbaum Associ- ates, Hillsdale, NJ, US (1999) 2. Pal, S.K., Dilon, T.S., Yeung, D.S.: Soft Computing in Case Based Reasoning, Springer Verlag, London (2000) 3. Sankar, K.P., Simon, C.K.S: Foundations of Soft Case-Based Reasoning, Wiley- Interscience, Hoboken, New Jersey (2003) 4. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences, Vol. 11. (1982) 341–356 5. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991) 6. Pawlak, Z.: Rough sets: present state and the future. Foundations of Computing and Deci- sion Sciences, Vol. 11 (3-4). (1993) 157–166 7. Ziarko, W.: Variable Precision Rough Set Model. Journal of Computer and System Sci- ences, Vol. 46. (1993) 39–59 8. Düntsch, I., Gediga, G.: Statistical evaluation of rough set dependency analysis. Interna- tional Journal of Human-Computer Studies, Vol. 46. (1997) 589–604 9. Düntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artificial Intelli- gence, Vol. 106. (1998) 77–107 10. Díaz, F., Corchado, J.M.: A method based on the Rough Set theory and the MDL principle to select relevant features. Proc. of the X CAEPIA - V TTIA, Vol. 1. (2003) 101–104 11. Rissanen, J.: Minimum description length principle. In Kotz, S. and Johnson, N. L. (eds.). Encyclopedia of Statistical Sciences. John Wiley and Sons, New York (1985) 523–527 12. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence, Vol. 97. (1997) 245–271 13. Fdez-Riverola, F., Corchado, J.M.: FSfRT, Forecasting System for Red Tides. An Hybrid Autonomous AI Model. Applied Artificial Intelligence, Vol. 17 (10). (2003) 955–982 14. Fdez-Riverola, F., Corchado, J.M.: An automated CBR Revision method based on a set of β-TSK Fuzzy models. Proc. of the X CAEPIA - V TTIA, Vol. 1. (2003) 395–404 15. Fdez-Riverola, F.: Neuro-symbolic model for unsupervised forecasting of changing envi- ronments. Ph.D. diss., Dept. of Computer Science, Vigo University, Spain (2002)