A. Abraham and S.M. Thampi (Eds.): Intelligent Informatics, AISC 182, pp. 175–181. 
springerlink.com                                           © Springer-Verlag Berlin Heidelberg 2012 
Comparative Genomics with Multi-agent 
Systems* 
Juan F. De Paz, Carolina Zato, Fernando de la Prieta, Javier Bajo, 
Juan M. Corchado, and Jesús M. Hernández 
Abstract. The detection of the regions with mutations associated with different 
pathologies is an important step for selecting relevant genes. The corresponding 
information of the mutations and genes is distributed in different public sources 
and databases, so it is necessary to use systems that can contrast different sources 
and select conspicuous information. This work proposes a virtual organization of 
agents that can analyze and interpret the results from Array-based comparative 
genomic hybridization, thus facilitating the traditionally manual process of the 
analysis and interpretation of results. 
Keywords: arrays CGH, knowledge extraction, visualization, multiagent system. 
1   Introduction 
Different techniques presently exist for the analysis and identification of 
pathologies at a genetic level. Along with massive sequencing, which allows the 
exhaustive study of mutations, the use of microarrays is highly extended. CGH 
arrays (aCGH) (Array-based comparative genomic hybridization) are a type of 
microarray that can analyze information on the gains, losses and amplifications [7] 
in regions of the chromosomes to detect mutations [5], [3]. Expression arrays 
measure the expression level of the genes. aCGH are currently used to detect 
relevant regions that may require deeper analysis. In these cases, it is necessary to 
                                                          
Juan F. De Paz · Carolina Zato · Fernando de la Prieta · Javier Bajo · Juan M. Corchado 
Department of Computer Science and Automation, University of Salamanca 
Plaza de la Merced, s/n, 37008, Salamanca, Spain 
e-mail: {fcofds,carol_zato,fer,corchado}@usal.es 
 
Jesús M. Hernández 
IBMCC, Cancer Research Center, University of Salamanca-CSIC, Spain 
e-mail: jhmr@usal.es 
 
Jesús M. Hernández 
Servicio de Hematología, Hospital Universitario de Salamanca, Spain 
176 J.F. De Paz et al.
 
work with vast amounts of information, which necessitates the creation of a 
system that can facilitate the automatic analysis of data that, in turn, facilitates the 
extraction of relevant information using different data bases. For this reason, it is 
necessary to automate the aCGH processing. 
aCGH, also called microarray analysis, is a new cytogenetic technology that 
evaluates areas of the human genome for gains or losses of chromosome segments 
at a higher resolution than traditional karyotyping. When working with aCGH, 
segments of DNA (Deoxyribonucleic Acid) are selected from public genome 
databases based upon their location in the genome. Computer software analyzes 
the fluorescent signals for areas of unequal hybridization of patient versus control 
DNA, signifying a DNA dosage alteration (deletion or duplication). These arrays 
offer genome-covering resolution that can offer precise delineation of breakpoints. 
This is important in determining common regions of overlap and implicated genes. 
Due to their small target size, oligonucleotide arrays suffer from poorer signal to 
noise ratios that often results in a significant number of false-positive outliers. At 
present, tools and software already exist to analyze the data of arrays CGH, such 
as CGH-Explorer [2], ArrayCyGHt [12], CGHPRO [1], WebArray [8] or 
ArrayCGHbase [4], VAMP [6]. The problem with these tools is the lack of 
usability and of an interactive model. For this reason, it is necessary to create a 
visual tool to analyse the data in a simpler way. 
The process of arrays CGH analysis is broken down into a group of structured 
stages, although most of the analysis process is done manually from the initial 
segmentation of the data. This study presents a multi-agent system [10] that 
defines roles to automatically perform the different stages of the analysis. In the 
first stage, the data are segmented [11] to reduce the number of gains or losses 
fragments to be analyzed. The following steps vary in terms of the type of analysis 
being performed and include: grouping, classification, visualization, or extraction 
of information from different sources. The system tries to facilitate the analysis 
and the automatic interpretation of the data by selecting the relevant genes, 
proteins and information from the previous classification of pathologies. The 
system provides several representations in order to facilitate the visual analysis of 
the data. The information for the identified genes, CNVs (Copy-number 
variations), pathologies etc. is obtained from public databases.  
This article is divided as follows: section 2 describes our system, and section 3 
presents the results and conclusions. 
2   Multi-agent System 
The multi-agent system designed to analyze our data is general enough that it can 
be adapted for other types of data analysis. The multi-agent system is divided into 
different layers: the analysis layer, the information management layer. The 
developed system receives data from the analysis of chips and is responsible for 
representing the data for extracting relevant segments on evidence and existing 
data. Working from the relevant cases, the first step consists of selecting the 
information about the genes and transcripts stored in the databases. This 
information will be associated to each of the segments, making it possible to 
Comparative Genomics with Multi-agent Systems 177
 
quickly consult the data and reveal the detected alterations at a glance. The data 
analysis can be carried out automatically or manually.  
2.1   Analysis Roles 
The analysis roles contains the agents responsible for performing the actual 
microarray analyses. The information management layer compiles the information 
from the database and generates local databases to facilitate their analysis. The 
visualization layer facilitates the management of both the information and the 
algorithms; it displays the information and the results obtained after applying the 
existing algorithms at the analysis layer.  
The agents at the analysis layer adapt to the specific class of microarray, in this 
case the aCGH, and within the aCGH they adapt to the different types of 
microarrays with which they work. To perform the data analysis, the agents are 
incorporated for: segmentation, Knowledge extraction, and Clustering. 
The segmentation process is performed by taking into account the differential 
normalization for gains and losses. The segmentation process is based on the 
mad1dr (median absolute deviation, 1st derivative) value for each of the arrays, 
which determines the threshold for gains or losses that is considered relevant for 
each case. This metric provides a surrogate measure of experimental noise. 
For this particular system, the use of chi Square was chosen because it is the 
technique that makes it possible to work with different qualitative nominal 
variables to study factor and its response. The contrast of Chi Square makes it 
possible to obtain as output the values that can sort the attributes by their 
importance, providing an easier way to select the elements. As an alternative, gain 
functions could be applied in decision trees, providing similar results. 
2.2   Information Management Roles 
Once the relevant segments have been selected, the researchers can introduce 
information for each of the variants. The information is stored in a local database. 
These data are considered in future analyses although they have to be reviewed in 
detail and contrasted by the scientific community. The information is shown in 
future analyses with the information for the gains and losses. However, because 
only the information from public databases is considered reliable, this information 
is not included in the reports. 
Besides the system incorporates a role to retrieve information from UCSC 
(University of California Santa Cruz) and use this information to generate reports. 
This information is important in order to select the relevant segments.  
3   Visual Analysis 
A visual analysis is performed of the data provided by the system and the 
information recovered from the databases. New visualizations are performed in 
order to more easily locate the mutations, thus facilitating the identification of 
178 J.F. De Paz et al.
 
mutations that affect the codification of genes among the large amount of genes. 
Visualization facilitates the validation of the results due to the interactivity and 
ease of use of previous information. Existing packages such as CGHcall [9] in R 
do not display the results in an intuitive way because it is not possible to associate 
segments with regions and they do not allow interactivity.  
The system provides a visualization to select the regions with more variants 
and relevant regions in different pathologies. The visualizations make is possible 
to extract information from databases using a local database.  
A visual analysis is performed of the data provided by the system and the 
information recovered from the databases. New visualizations are performed in 
order to more easily locate the mutations, thus facilitating the identification of 
mutations that affect the codification of genes among the large amount of genes. 
Visualization facilitates the validation of the results due to the interactivity and 
ease of use of previous information. Existing packages such as CGHcall [9] in R 
do not display the results in an intuitive way because it is not possible to associate 
segments with regions and they do not allow interactivity.  
The system provides a visualization to select the regions with more variants 
and relevant regions in different pathologies. The visualizations make is possible 
to extract information from databases using a local database.  
4   Results and Conclusions 
In order to analyze the operation of the system, different data types of array CGH 
were selected. The system was applied to two different kinds of CGH arrays: BAC 
aCGH, and Oligo aCGH. The information obtained from the BAC aCGH after 
segmenting and normalizing is represented in Tab. 1. As shown in the figure, there 
is one patient for each column. The rows contain the segments so that all patients 
have the same segments. Each segment is a tuple composed of three elements: 
chromosome, initial region and final region. The values vij represent gains and 
losses for segment i and patient j. If the value is positive, or greater than the 
threshold, it is considered a gain; if it is lower than the value, it is considered a 
loss.  
Table 1 BAC aCGH normalized and segmented 
Segment Patient 1 Patient 2 ... Pantient n 
Init-end v11 v12 ... v1n 
Init-end v21 v22 ... v2n 
 
The system includes the databases because it extracts the information from 
genes, proteins and diseases. These databases have different formats but basically 
there is a tuple of three elements for each row (chromosome, start, end, other 
information). Altogether, the files downloaded from UCSC included slightly more 
than 70,000 registries. 
Comparative Genomics with Multi-agent Systems 179
 
 
Fig. 1 Selection of segments and genes automatically 
Figure 1 displays the information for 18 oligo arrays cases. Only the 
information corresponding to chromosome 11 is shown. The green lines represent 
gains for the patient in the associated region of the chromosome, while the red 
lines represent losses. The user can select the regions and use these highlighted 
regions to generate reports. 
When performing the visual analysis, users can retrieve information from a 
local database or they can browse through UCSC. For example, figure 2 contains a 
 
 
Fig. 2 Report with relevant genes 
180 J.F. De Paz et al.
 
report with the information for the segment belonging to the irrelevant region 
shown in the previous image. 
In order to facilitate the revision and learning phases for the expert, a different 
visualization of the data is provided. This view helps to verify the results obtained 
by the hypothesis contrast regarding the significance of the differences between 
pathologies. Figure 3 shows a dendrogram with the information of the groups. The 
expert can review the clusters and modify the group belong each patient selecting 
each patient. 
 
 
Fig. 3 Reviewing clustering process  
The presented system facilitates the use of different sources of information to 
analyze the relevance in variations located in chromosomic regions. The system is 
able to select the genes, variants, genomic duplications that characterize 
pathologies automatically, using several databases. This system allows the 
management of external sources of information to generate final results. The 
provided visualizations make it possible to validate the results obtained by an 
expert more quickly and easily. 
Acknowledgments. This work has been supported by the MICINN TIN 2009-13839- 
C03-03.  
References 
[1] Chen, W., Erdogan, F., Ropers, H., Lenzner, S., Ullmann, R.: CGHPRO-a 
comprehensive data analysis tool for array CGH. BMC Bioinformatics 6(85), 299–
303 (2005) 
[2] Lingjaerde, O.C., Baumbush, L.O., Liestol, K., Glad, I.K., Borresen-Dale, A.L.: 
CGH-explorer, a program for analysis of array-CGH data. Bioinformatics 21(6), 
821–822 (2005) 
[3] Mantripragada, K.K., Buckley, P.G., Diaz de Stahl, T., Dumanski, J.P.: Genomic 
microarrays in the spotlight. Trends Genetics 20(2), 87–94 (2004) 
Comparative Genomics with Multi-agent Systems 181
 
[4] Menten, B., Pattyn, F., De Preter, K., Robbrecht, P., Michels, E., Buysse, K., 
Mortier, G., De Paepe, A., van Vooren, S., Vermeesh, J., et al.: Array CGH base: an 
analysis platform for comparative genomic hybridization microarrays. BMC 
Bioinformatics 6(124), 179–187 (2006) 
[5] Pinkel, D., Albertson, D.G.: Array comparative genomic hybridization and its 
applications in cancer. Nature Genetics 37, 11–17 (2005) 
[6] Rosa, P., Viara, E., Hupé, P., Pierron, G., Liva, S., Neuvial, P., Brito, I., Lair, S., 
Servant, N., Robine, N., Manié, E., Brennetot, C., Janoueix-Lerosey, I., Raynal, V., 
Gruel, N., Rouveirol, C., Stransky, N., Stern, M., Delattre, O., Aurias, A., Radvanyi, 
F., Barillot, E.: VAMP: Visualization and analysis of array-CGH transcriptome and 
other molecular profiles. Bioinformatics 22(17), 2066–2073 (2006) 
[7] Wang, P., Young, K., Pollack, J., Narasimham, B., Tibshirani, R.: A method for 
callong gains and losses in array CGH data. Biostat. 6(1), 45–58 (2005) 
[8] Xia, X., McClelland, M., Wang, Y.: WebArray, an online platform for microarray 
data analysis. BMC Bionformatics 6(306), 1737–1745 (2005) 
[9] Van de Wiel, M.A., Kim, K.I., Vosse, S.J., Van Wieringen, W.N., Wilting,  
S.M., Ylstra, B.: CGHcall: calling aberrations for array CGH tumor profiles. 
Bioinformatics 23(7), 892–894 (2007) 
[10] Argente, E., Botti, V., Carrascosa, C., Giret, A., Julian, V., Rebollo, M.: An abstract 
architecture for virtual organizations: The THOMAS approach. Knowledge and 
Information Systems 29(2), 379–403 (2011) 
[11] Smith, M.L., Marioni, J.C., Hardcastle, T.J., Thorne, N.P.: snapCGH: Segmentation, 
Normalization and Processing of aCGH Data Users’ Guide. Bioconductor (2006) 
[12] Kim, S.Y., Nam, S.W., Lee, S.H., Park, W.S., Yoo, N.J., Lee, J.Y., Chung, Y.J.: 
ArrayCyGHt, a web application for analysis and visualization of array-CGH data. 
Bioinformatics 21(10), 2554–2555 (2005)