Kernel Maximum Likelihood Hebbian Learning  
Jos Koetsier1, Emilio Corchado2, Donald MacDonald1,                                        
Juan Corchado3, Colin Fyfe1. 
1 Applied Computation Intelligence Research Unit, University of Paisley, Scotland. 
{koet-ci0,macd-ci0, fyfe-ci0}@paisley.ac.uk                                                                      
2Departamento de Ingenieria Civil. Universidad de Burgos. Spain. 
escorchado@ubu.es 
3Departamento de Informática y Automática. Universidad de Salamanca. Spain. 
corchado@usal.es 
Abstract. We present a novel method based on a recently proposed extension 
to a negative feedback network which uses simple Hebbian learning to self-
organise called Maximum Likelihood Hebbian learning [2]. We use the kernel 
version of the ML algorithm on data from a spectroscopic analysis of a stained 
glass rose window in a Spanish cathedral. It is hoped that in classifying the ori-
gin and date of each segment it will help in the restoration of this and other his-
torical stain glass windows. 
1   Introduction 
One problem with the analysis of high dimensional data is identifying structure or 
patterns which exist across dimensional boundaries. By projecting the data onto a 
different basis of the space, these patterns may become visible. This presents a prob-
lem - how does one decide which basis is optimal for the visualisation of the patterns, 
without foreknowledge of the patterns in the data. 
One solution is Principal Component Analysis (PCA), which is a statistical technique 
aimed at finding the orthogonal basis that maximises the variance of the projection 
for a given dimensionality of basis. This involves finding the direction which ac-
counts for most of the data's variance, the first principal component; this variance is 
then filtered out. The next component is the direction of maximum variance from the 
remaining data and orthogonal to the 1st PCA basis vector. 
We [3, 4] have over the last few years investigated a negative feedback implementa-
tion of PCA defined by (1- 3). Let us have an N-dimensional input vector, x, and an 
M-dimensional output vector, y, with Wij being the weight linking the jth input to the 
ith output. The learning rate, η, is a small value which will be annealed to zero over 
the course of training the network. The activation passing from input to output 
through the weights is described by (1). The activation is then fed back though the 
weights from the outputs and the error, e, calculated for each input dimension. Finally 
the weights are updated using simple Hebbian learning. 
∑
=
=
N
,
1j
jiji xWy i∀  
    (1) 
∑
=
−=
M
i
iijjj yWxe
1
, j∀      (2) 
ijij yeW η=∆      (3) 
We have subsequently modified this network to perform clustering with topology 
preservation [5], to perform Factor Analysis [8, 1] and to perform Exploratory Projec-
tion Pursuit (EPP) [7, 6]. 
2   Maximum Likelihood Hebbian Learning 
This paper deals with a recently developed variation of the basic network which also 
performs Exploratory Projection Pursuit. We show the validity of our method by 
applying it to data from a spectroscopic analysis of the stained glass rose window in a 
Spanish cathedral. It is hoped that in classifying the origin and date of each segment it 
will help in the restoration of this and other historical stain glass windows. 
Let us now consider the residual after the feedback to have probability density func-
tion. 
)||exp(1)( p
Z
p ee −= . (4) 
Then we can denote a general cost function associated with this network as 
KpJ p +=−= ||)(log ee    (5) 
where K is a constant. Therefore performing gradient descent on J we have 
Tp signpy
W
J
W
JW ))(||( 1 eee
e
−≈∂
∂
∂
∂−=∂
∂−∝∆    (6) 
We would expect that for leptokurtotic residuals (more kurtotic than a Gaussian dis-
tribution), values of p < 2 would be appropriate, while platykurtotic residuals (less 
kurtotic than a Gaussian), values of p > 2 would be appropriate.  
3   Kernel Maximum Likelihood 
The first step in our kernel version of the ML algorithm can be performed by project-
ing our data onto a set of eigenvectors in feature space and thus we obtain sphered 
data in feature space. We can subsequently reduce the dimensionality as a further pre-
processing step to the ML method. As the transformed data are actually points in 
feature space, we can simply apply the maximum likelihood method on the trans-
formed data, as though it would be in data space. 
Let zk be the datapoint xk transformed into feature space and projected onto the prin-
cipal components in feature space. The Kernel Maximum Likelihood (KML) learning 
rules then become 
Feedforward:   ∑
=
∀=
N
j
ikjiji zWy
1
,
(7) 
Feedback: 
e
  ∑
=
−=
M
i
iijkjj yWz
1
(8) 
Weightchange: ( ) 1−= pjjiij eesignyW η∆   (9) 
4   Experiments 
The data used to illustrate our method is composed of samples from 76 different sec-
tions of the stained glass rose window in a Spanish cathedral. The data contained 450 
data vectors obtained from 90 samples each having been analysed 5 times. The data is 
1020 dimensions, which after normalisation was reduced to 390 dimensions. 
The data was analysed by [9] and during their analysis of the data they found that it 
contained three different clusters. These clusters were identified to be separate by 
their chemical composition, one belonging to the 16th century, and the other two 
classes which were from the 13th century. These clusters were classified by the pro-
portion of sodium and potassium in the glass.  
We visualise the data using our KML method on this data so we can look for clusters 
by eye. As we are interested in projection exhibiting clusters, we wish to extract sub-
gaussian signals and therefore we use a value for p = 0. We compare our results to 
those obtained by normal ML. 
4.1 Results 
In Figure 1, we show the projections obtained by ML (Right) and KML (Left). The 
first row of the figures show scatterplots of projections onto the first direction set out 
against projections onto first until the fourth directions. The second row shows pro-
jections onto the second direction set out against projections onto first until the fourth 
directions and so forth. In both figures we can identify clusters, but the projections 
obtained by the KML method generally result in a better separation of the clusters. 
The non-linear extension to the standard Maximum Likelihood method has given the 
method greater flexibility which allows us to get a different view on the data that 
certainly in this case yields much better clustering. 
 
  
(a) Kernel Maximum Likelihood (b) Maximum Likelihood 
Figure. 1. Projection of the glass data using the KML (left) and the ML methods (right). The 
figures show the result of projection onto four directions found. It can be seen that the Kernel 
Maximum Likelihood method results in greater separation of the clusters. 
5 Conclusion 
In this paper we have introduced a novel extension to the Maximum Likelihood Heb-
bian learning algorithm that allows the method to use non linear projections. The data 
is actively being used in research projects to help in future restoration of stain glass 
windows. Using our new clustering methods we visualise the data to identify relation-
ships between the different chemical properties of the glass samples. 
We have shown that our new non linear extension of Maximum Likelihood has found 
clusters in this data set that could not be identified by ML. This new method therefore 
allows us to better analyse the data and give us different visualisations of our data set. 
References 
1.  Charles, D. and Fyfe, C. Modelling Multiple Cause Structure using Rectification con-
Straints. Network: Computation in Neural Systems, 9:167-182, May 1998. 
2. Corchado, E. and Fyfe, C. Maximum Likelihood Hebbian Rules. In Tenth European Sympo-
sium on Artificial Neural Networks, ESANN2002, pages 143-148, 2002. 
3. Fyfe, C. PCA Properties of Interneurons. In From Neurobiology to Real World Computing, 
ICANN 93, pages 183-188, 1993. 
4. Fyfe, C. Introducing Asymmetry into Interneuron learning. Neural Computation, 7(6):1167-
1181, 1995. 
5. Fyfe, C. Radial Feature Mapping. In International Conference on Artificial Neural Net-
works, ICANN95, Oct. 1995. 
6. Fyfe, C. A Comparative Study of Two Neural Methods of Exploratory Projection Pursuit. 
Neural Networks, 10(2):257-262, 1997. 
7. Fyfe, C. and Baddeley, R. Non-linear Data Dtructure Extraction using Simple Hebbian 
Networks. Biological Cybernetics, 72(6):533{541, 1995. 
8. Fyfe, C. and Charles, D. Using Noise to Form a Minimal Overcomplete Basis. In Seventh 
International Conference on Artificial Neural Networks, ICANN99, 1999. 
9. Lopez-Gejo, J. Colina, A. Lopez-Palacios, J. and Bravo, P. Principal Components Analysis 
in the Classification of Medieval Glasses by Scanning Electron Microscopy Coupled with 
Energy Dispersive X-ray Analysis. (submitted), 2003.