Neuro-symbolic System for Forecasting Red Tides  Florentino Fdez-Riverola1, Juan M. Corchado2, and Jesu´s M. Torres3 1 Dpto. de Informa´tica, E.S.E.I., University of Vigo, Campus Universitario As Lagoas s/n., 32004, Ourense, Spain riverola@uvigo.es 2 Dpto. de Informa´tica y Automa´tica, University of Salamanca, Facultad de Ciencias, Plaza de la Merced, s/n., 37008, Salamanca, Spain corchado@usal.es 3 Dpto. de F´ısica Aplicada, University of Vigo, Facultad de Ciencias, Lagoas Marcosende, 36200, Vigo, Spain jesu@uvigo.es Abstract. A hybrid neuro-symbolic problem solving model is presented in which the aim is to forecast parameters of a complex and dynamic en- vironment in an unsupervised way. In situations in which the rules that determine a system are unknown, the prediction of the parameter val- ues that determine the characteristic behaviour of the system can be a problematic task. In such a situation, it has been found that a hybrid case-based reasoning (CBR) system can provide a more effective means of performing such predictions than other connectionist or symbolic tech- niques. The system employs a CBR model to wrap a growing cell struc- tures network, a radial basis function network and a set of Sugeno fuzzy models to provide an accurate prediction. Each of these techniques is used in a different stage of the reasoning cycle of the CBR system to retrieve historical data, to adapt it to the present problem and to review the proposed solution. The results obtained from experiments, in which the system operated in a real environment, are presented. 1 Introduction Forecasting the behaviour of a dynamic system is, in general, a difficult task, es- pecially if the prediction needs to be achieved in real time. In such a situation one strategy is to create an adaptive system which possesses the flexibility to behave in different ways depending on the state of the environment. This paper presents the application of a novel hybrid artificial intelligence (AI) model to a forecasting problem over a complex and dynamic environment. The approach presented is capable of producing satisfactory results in situations in which neither artificial neural network nor statistical models have been sufficiently successful.  This research was supported in part by PGIDT00MAR30104PR project of Xunta de Galicia, Spain The oceans of the world form a highly dynamic system for which it is difficult to create mathematical models [1]. Red tides are the name for the discolourations caused by dense concentrations of the microscopic plants of the sea, the so-called phytoplankton. The discolouration varies with the species of phytoplankton, its pigments, size and concentration, the time of day, the angle of the sun and other factors. Red tides usually occur along the north west of the Iberian Peninsula in late summer and autumn [2]. The rapid increase in dinoflagellate numbers, sometimes to millions of cells per liter of water, is what is known as a bloom of phytoplankton (if the concentration ascends above the 100.000 cells per liter). The type of dinoflagellate on which we focus in this study is the pseudo-nitzschia spp diatom which is known to cause amnesic shellfish poisoning (ASP). An AI approach to the problem of forecasting in the ocean environment offers potential advantages over alternative approaches, because it is able to deal with uncertain, incomplete and even inconsistent data. Several types of standard artificial neural network (ANN) have been used to forecast the evolution of different oceanographic parameters [3–5]. Our aim is to develop an autonomous and reliable forecasting mechanism. The results obtained to date suggest that the approach to be described in this paper appears to fulfil this aim. The work presented in this paper is based on the successful results obtained with the hybrid CBR system reported in [4–6] and used to predict the evolution of the temperature of the water ahead of an ongoing vessel, in real time. The hybrid system proposed in this paper is an extension and an improvement of the previously mentioned research. The retrieval, reuse, revision and learning stages of the CBR system have been modified or changed for two reasons: to adapt the hybrid system to the previously mentioned problem and to automate completely the reasoning process of the proposed forecasting mechanism. The structure of the paper is as follows. First a brief overview of the CBR systems for forecasting is presented. Then the red tide problem domain is briefly outlined. The hybrid neuro-symbolic system is then explained, and finally, the results obtained to date with the proposed forecasting system are presented and analyzed. 2 CBR Systems for Forecasting Several researchers [7, 8], have used k-nearest-neighbour algorithms for time se- ries predictions. Although a k-nearest-neighbour algorithm does not, in itself, constitute a CBR system, it may be regarded as a very basic and limited form of CBR operation in numerical domains. Other examples of CBR systems that carry out predictions can be found in [9–13]. In most cases, the CBR systems used in forecasting problems have flat mem- ories with simple data representation structures using k-nearest-neighbour met- rics in their retrieve phase. K-nearest-neighbour metric are acceptable if the system is relatively stable and well understood, but if the system is dynamic and the forecast is required in real time, it may not be possible to easily rede- fine the k-nearest-neighbour metrics adequately. The dominant characteristic of the adaptation stage used in these models are similarity metrics or statistical models, although, in some systems, case adaptation is accomplished manually. If the problem is very complex, there may not be an adaptation strategy and the most similar case is used directly, but it is believed that adequate adaptation is one of the keys to a successful CBR paradigm. In the majority of the systems surveyed, case revision (if carried out at all) is performed by human expert, and in all the cases the CBR systems are provided with a small case-base. A survey of such forecasting CBR systems can be found in [14]. In this paper a method for automating the CBR reasoning process is presented for the solution of problems in which the cases are characterised predominantly by numerical information. Traditionally, CBR systems have been combined with other technologies such as artificial neural networks, rule-based systems, constraint satisfaction problems and others, producing successful results [15]. Our proposal requires to embed two artificial neural networks and a set of fuzzy systems in the CBR life cycle. 3 Forecasting Red Tides In the current work the aim is to develop a system for forecasting the concentra- tions of the pseudo-nitzschia spp, that it is the diatom that produces the most harmful red tides, at different geographical points one week in advance. The problem of forecasting, which is currently being addressed, may be sim- ply stated as follows: – Given: a sequence of data values (representing the current state and the immediately previous one) about some physical and biological parameters, – Predict: the value of a parameter at some future point(s) or time(s). In order to forecast the concentration of pseudo-nitzschia spp at a given point one week in advance, a problem descriptor is generated on a weekly basis. A prob- lem descriptor consists of a sequence of N sampled data values (filtered and pre- processed) recorded from the water mass for which the forecast is required, and the collection time and date. Every week the concentration of pseudo-nitzschia spp is added to a problem descriptor forming a new input vector. The problem descriptor is composed of a vector with the variables that characterise the prob- lem recorded during two weeks. The prediction or output of the system is the concentration of pseudo-nitzschia spp one week after, as indicated in Table 1. The forecasted values are obtained using a neural network enhanced hybrid case-base reasoning system. Figure 1 illustrates the relationships between the processes and components of the hybrid CBR system. The cyclic CBR process shown in Figure 1 the has been inspired by the work of [4] and [5]. The diagram shows the technology used in each stage, where the four basic phases of the CBR cycle are shown as rectangles. The retrieval stage is carried out using a Growing Cell Structures (GCS) ANN [16]. The GCS facilitates the indexing of cases and the selection of those that are more similar to the problem descriptor. The GCS network groups similar cases and forms classes. When a new problem is presented to this network, it Table 1. Variables that define a case Variable Unit Week Date dd-mm-yyyy Wn−1, Wn Temperature Cent. degrees Wn−1, Wn Oxygen milliliters/liter Wn−1, Wn PH acid/based Wn−1, Wn Transmitance % Wn−1, Wn Fluorescence % Wn−1, Wn Cloud index % Wn−1, Wn Recount of diatoms cel/liter Wn−1, Wn pseudo-nitzschia spp cel/liter Wn−1, Wn pseudo-nitzschia spp (future) cel/liter Wn+1 is associated to its most representative class and all members of such class are retrieved. The reuse of cases is carried out with a Radial Basis Function (RBF) ANN [17], which generates an initial solution creating a model with the retrieved cases. The GCS network guarantees that these cases are homogeneous and can be modeled by the RBF network. The revision is carried out using a group of pondered Fuzzy systems that identify potential incorrect solutions. Finally, the learning stage is carried out when the real value of the concentration of pseudo-nitzschia spp is measured and the error value is calculated, updating the knowledge structure of all the system. The cycle of operations of the hybrid system is explained in detail in Section 3.1. 3.1 System Operation The forecasting system uses data from two main sources: The raw data (sea temperature, salinity, PH, oxygen and other physical characteristics of the water mass) which are weekly measured by the monitoring net of toxic proliferations in the CCCMM (Centro de Control da Calidade do Medio Marino, Oceanographic environment Quality Control Centre, Vigo, Spain), consist of a vector of discrete sampled values (at 5, 10 and 15 meters deep) of each oceanographic parameter used in this experiment, in the form of a time series. These data values are com- plemented by additional data derived from satellite images, which are received and processed daily, and other data belonging to ocean buoys that record data on a daily bases. Table 1 shows the variables that characterise the problem. Data of the last 2 weeks (Wn−1, Wn) is used to forecast the concentration of pseudo-nitzschia spp one week ahead (Wn+1). The cycle of forecasting operations (which is repeated every week) proceeds as follows. When a new problem is presented to the system, the GCS neuronal network is used to obtain the k most similar cases to the given problem (identi- fying the class to which the problem belongs). In the reuse phase, the values of the weights and centers of the neural network [17] used in the previous forecast are retrieved from the knowledge base. These Fig. 1. Hybrid neuro-symbolic system network parameters together with the k retrieved cases, are then used to retrain the RBF network and to obtain an initial forecast of the concentration of pseudo- nitzschia spp. During this process the values of the parameters that characterise the network are updated. In the revision phase, the initial solution proposed by the RBF neural net- work is modified according to the responses of the four Fuzzy revision subsys- tems. Each revision subsystem has been created from the RBF network using neurofuzzy techniques [18]. For each class of the GCS neural network a vector of four values is maintained. This “importance” vector (see Figure 1) represents the accuracy of each revision subsystem with respect to a class. During the re- vision, the “importance” vector associated to the class to which the problem case belongs, is used to ponder the outputs of each of the fuzzy revision system. Each value of the vector is associated to one of the four revision subsystems. For each forecasting cycle, the value of the importance vector associated to the most accurate revision subsystem is increased and the other three values are pro- portionally decreased. This is done to give more relevance to the most accurate revision subsystem. The revised forecast is then retained temporarily in the forecast database. When the real value of the concentration of pseudo-nitzschia spp is measured, the forecasted value for the variable can then be evaluated, by comparison of the actual and forecasted value, and the error obtained. A new case, corresponding to Fig. 2. Summary of technologies employed by the hybrid model this forecasting operation, is then stored in the case base. The forecasting error value is also used to update the importance vector associated to the revision subsystems of the retrieved class. 4 Results The hybrid forecasting system has been proven in the coast of north west of the Iberian Peninsula with data collected by the CCCMM from the year 1992 until the present time. The prototype used in this experiment was set up to forecast the concentration of the pseudo-nitzschia spp diatom of a water mass situated near the coast of Vigo (geographical area A0 ((42◦28.90’ N, 8◦57.80’ W) 61 m)), one week in advance. Red tides appear when the concentration of pseudo-nitzschia spp is higher than 100.000 cel/liter. Although the aim of this experiment is to forecast the value of this concentration, the most important objective is to identify in advance if the concentration is going to be over this threshold. The average error in the forecast was found to be 26,043 cel/liter and only 5.5% of the forecasts had an error higher than 100,000 cel/liter. Although the experiment was carried out using a limited data set, it is believed that these error value results are sufficiently representative to be extrapolated over the whole coast of the Iberian Peninsula. Two situations of special interest are those corresponding to the false alarms and the undetected blooms. The first one happens when the model predicts bloom (concentration of pseudo-nitzschia ≥ 100,000 cel/liter) and this doesn’t take place (real concentration ≤ 100,000 cel/liter). The second, more important, arise when bloom really exists and the model doesn’t detect it. Table 2 shows the predictions carried out with success (in absolute value and %) and the erroneous predictions differentiating the undetected blooms and the false alarms. This table also shows the average error obtained with all the techniques. As can be seen, the combination of different techniques in the form of the hybrid CBR system previously presented, produces better results that a RBF neural network working alone or any of the tested statistical techniques. This is due to the effectiveness of the revision subsystem and the retrained of the RBF neural network with the cases recovered by GCS network. The hybrid system is more accurate than any of the other techniques studied during this investigation. Table 2. Summary of results forecasting pseudo-nitzschia spp Method Correct % Undetected False Average error predictions Correct blooms alarms (cel/liter) CBR-ANN-FS 191/200 95.5% 8 1 26,044 RBF 185/200 92.5% 8 7 45,654 ARIMA 174/200 87% 10 16 71,918 Quadratic Trend 184/200 92% 16 0 70,354 Moving Average 181/200 90.5% 10 9 51,969 Simp. Exp. Smooth. 183/200 91.5% 8 9 41,943 Lin. Exp. Smooth. 177/200 88.5% 8 15 49,038 5 Conclusions This paper has presented a problem solving method in which a CBR system is integrated with two artificial neural networks and a set of fuzzy inference systems in order to create a real-time, autonomous forecasting system. The forecasting system is able to produce a forecast with an acceptable degree of accuracy. The method uses a CBR system to wrap a growing cell structures network (to index, organize and retrieve relevant data), a radial basis function network (that contributes with generalization, learning and adaptation capabilities) and a set of Sugeno fuzzy models (acting as experts that revise the initial solution) to provide a more effective prediction. The resulting hybrid system thus combines complementary properties of connectionist and symbolic AI methods. The results obtained may be extrapolated to provide forecasts further ahead using the same technique, and it is believed that successful results may be obtained. However, the further ahead the forecast is made, the less accurate the forecast may be expected to be. In conclusion, our hybrid approach to problem solving provides an effective strategy for forecasting in an environment in which the raw data is derived from the previously mentioned sources. References 1. Tomczak, M., Godfrey, J. S.: Regional Oceanographic: An Introduction. Pergamon, New York, (1994) 2. Ferna´ndez, E.: Las Mareas Rojas en las R´ıas Gallegas. Technical Report, Depart- ment of Ecology and Animal Biology. University of Vigo, (1998) 3. Corchado, J. M., Fyfe, C.: Unsupervised Neural Network for Temperature Forecast- ing. Artificial Intelligence in Engineering, 13, num. 4, (1999) 351–357 4. Corchado, J. M., Lees, B.: A Hybrid Case-based Model for Forecasting. Applied Artificial Intelligence, 15, num. 2, (2001) 105–127 5. Corchado, J. M., Lees, B., Aiken, J.: Hybrid Instance-based System for Predict- ing Ocean Temperatures. International Journal of Computational Intelligence and Applications, 1, num. 1, (2001) 35–52 6. Corchado, J. M., Aiken, J., Rees, N.: Artificial Intelligence Models for Oceanographic Forecasting. Plymouth Marine Laboratory, U.K., (2001) 7. Nakhaeizadeh, G.: Learning prediction of time series. A theoretical and empirical comparison of CBR with some other approaches. Proceedings of First European Workshop on Case-Based Reasoning, EWCBR-93, Kaiserslautern, Germany, (1993) 65–76 8. Lendaris, G. G., Fraser, A. M.: Visual Fitting and Extrapolation. Weigend, A. S., Fershenfield, N. A. (Eds.). Time Series Prediction, Forecasting the Future and Un- derstanding the Past. Addison Wesley, (1994) 35–46 9. Lekkas, G. P., Arouris, N. M., Viras, L. L.: Case-Based Reasoning in Environmental Monitoring Applications. Artificial Intelligence, 8, (1994) 349–376 10. Faltings, B.: Probabilistic Indexing for Case-Based Prediction. Proceedings of Case-Based Reasoning Research and Development, Second International Confer- ence, ICCBR-97, Providence, Rhode Island, USA, (1997), 611–622 11. Mcintyre, H. S., Achabal, D. D., Miller, C. M.: Applying Case-Based Reasoning to Forecasting Retail Sales. Journal of Retailing, 69, num. 4, (1993), 372–398 12. Stottler, R. H.: Case-Based Reasoning for Cost and Sales Prediction. AI Expert, (1994), 25–33 13. Weber-Lee, R., Barcia, R. M., Khator, S. K.: Case-based reasoning for cash flow forecasting using fuzzy retrieval. Proceedings of the First International Conference on Case-Based Reasoning, ICCBR-95, Sesimbra, Portugal, (1995), 510–519 14. Corchado, J. M., Lees, B., Fyfe, C., Ress, N., Aiken, J.: Neuro-adaptation method for a case based reasoning system. Computing and Information Systems Journal, 5, num. 1, (1998), 15–20 15. Pal, S. K., Dilon, T. S., Yeung, D. S.: Soft Computing in Case Based Reasoning. Springer Verlag, London, (2000) 16. Azuaje, F., Dubitzky, W., Black, N., Adamson, K.: Discovering Relevance Knowl- edge in Data: A Growing Cell Structures Approach. IEEE Transactions on Systems, Man and Cybernetics, 30, (2000) 448–460 17. Fritzke, B.: Fast learning with incremental RBF Networks. Neural Processing Let- ters, 1, num. 1, (1994) 2–5 18. Jin, Y., Seelen, W. von., Sendhoff, B.: Extracting Interpretable Fuzzy Rules from RBF Neural Networks. Internal Report IRINI 00-02, Institut fu¨r Neuroinformatik, Ruhr-Universita¨t Bochum, Germany, (2000)