Development of Electrolarynx by Multi-agent Technology and Mobile Devices for Prosody Control Kenji Matsui1, Kenta Kimura1, Alberto Pérez2, Sara Rodríguez2 and Juan M. Corcha- do2 1Osaka Institute of Technology, Osaka, Japan 2Computer and Automation Department. University of Salamanca. Spain matsui@elc.oit.ac.jp, e1610024@st.oit.ac.jp {srg, alberto.pgarcia, corchado}@usal.es Abstract. The feasibility of using a motion sensor to replace a conventional electrolarynx (EL) user interface was explored. A mobile phone motion sensor with multi-agent platform was used to investigate on/off and pitch frequency control capability. A very small battery operated ARM-based control unit was also developed to evaluate the motion sensor based user-interface. The control unit was placed on the wrist and the vibration device was placed against the throat using support bandage. Two different conversion methods were used for the forearm tilt angle to pitch frequency conversion: linear mapping method and F0 template-based method. A perceptual evaluation was performed with two well-trained normal speakers and ten subjects. The results of the evaluation study showed that both methods were able to produce better speech quality in terms of naturalness. Keywords: multi-agent system, agents, prosody, electrolarynx, hands-free. 1. Introduction and background As a result of technological advances, intelligent systems have become an im- portant part of our lives. These systems can be found in multiple places and provide a huge range of facilities. Disabled people face increasing difficulties daily. Most build- ings and facilities are not adapted to their disabilities, so they find barriers, which they cannot overcome alone. However, with the help of these new intelligent systems disa- bled people are now able to overcome difficulties they encounter. People who have had laryngectomies have several options for restoration of speech, but no currently available device is satisfactory. An electrolarynx (EL) introduces a source vibration into a vocal tract by producing vibrations in the external wall. It is easy for patients to master, but the intelligibility of consonants they articulate is diminished and the speech is uttered at a monotone frequency since it does not produce airflow. Alterna- tively, esophageal speech does not require any special equipment, but requires speak- ers to insufflate, or inject air into the esophagus, and limits the pitch range and inten- sity. Both esophageal speech and tracheoesophageal speech are characterized by low average pitch frequency, large cycle-to-cycle perturbation in pitch frequencies, and low average intensity. Age has also been found to be an important factor for utilizing esophageal speech. When patients get older, they face difficulty in mastering the esophageal speech, or continue to use esophageal speech because of their waning strength. For that reason, the EL is an important device even for the people who use esophageal speech. There are many advantages of the EL. To begin, one can speak in long sentences that are easily understood. Additionally, no special care requirements are needed; the EL need only to be placed up against the neck and turned on. Finally, the EL can be used by almost everybody, regardless of the post-operative changes in the neck. In those few cases where scarring prevents proper placement of the EL, an intraoral version can be used. On the other hand, there are also some disadvantages. Firstly, the EL has a very mechanical tone that does not sound natural. There is usually little change in pitch or modulation. Secondly, its appearance is far from normal. Pitch frequency control is one of the important mechanisms for EL users to be able to gen- erate naturally sounding speech. There are some commercially available EL devices using a single push button with a pressure sensor to produce F0- contours [1], [2]. There are also similar studies of pitch controlling methods [3],[4]. However, none are hands-free. Some approaches for generating F0-contour without manual inter-action have been proposed. Saikachi et al. use the amplitude variation of EL speech [5]. Another approach is to generate an F0-contour using an air-pressure sensor that is put on the stoma [6][7]. Also, recently, a machine learning F0- contour generation from EL speech has been proposed [8]. Although most of those studies are in the early stages of research, the results show substantial improvement of EL speech quality. An EL system that has a hands-free user interface could be useful for enhancing commu- nication by alaryngeal talkers. Also, the appearance can be almost normal because users do not need to hold the transducer by hand against the neck. Almost all people frequently use gestures when they talk. It would be quite convenient if the EL users could utilize gestures to control the device. Furthermore, because hands can generate various types of motion, gesture control has a lot of potential to handle not only the on/off function, but many other functions as well. However, if users are not able to use hand gestures, we need to consider other part of body movement, or a completely different technique, such as EMG based hands-free EL control [9]. As for the total system management issue, multi-agent is one of the key technology trends. There are many multi-agent frameworks, which help and facilitate working with agents [16][17][18][19][20][21][22][23][24][25][26][27]. The main drawback of these systems is that they are for general purposes. General purpose was considered a major issue twenty years ago, but it is much less the case now, at a time where per- sonal computers, devices, mobile phones and alike, have grown exponentially. In addition, the needed architecture must be able to assume tasks for integrating disabled people. Moreover, differences among disabled people are very different. Some of the most known European multi-agent systems projects oriented in the direction of our research direction are [28][29][30][31][32]. The present study was undertaken to explore the feasibility of using multi-agent technology and mobile devices (replacing the conventional EL user interface) to con- trol both on/off function, and pitch frequency. The specific goals were: 1) to make the generated speech natural, and 2) to make the appearance normal. The paper is structured as follows: Section 2 presents a revision on the system re- quirements as indicated by the participants involved in the study. Afterwards, the system implementation is approached for the reader to fully understand the system design. Finally the article briefly explains the results and conclusions obtained from the study. 2. System Requirements A set of techniques - including user observations, interviews, and questionnaires - were used to understand implicit user needs. The total number of laryngectomized participants in the questionnaire survey was 121 (87% male, 13% female), including 65% esophageal talkers, 12% EL users, 7% both, and 21% used writing messages to communicate. Almost all of the participants claimed that most public areas are difficult for oral communication due to the noisy environment. Typical public areas include train sta- tions, inside of train cars, inside of vehicles, restaurants/pubs, and conven- tions/gatherings. The noisy environment issue is a well-known problem and people usually use porta- ble amplifier; however, we have been investigating a smaller, lighter, and low profile speech enhancement system for both esophageal speech [10] and EL. Other needs confirmed from the survey are: (i) Natural sounding voice, without a mechanical tone. (ii) Light weight device. (iii) Smaller device, low profile. (iv) Hands-free, easy to use. (v) Low cost. Based on those survey results, the present study was conducted to meet the essential user needs. 3. System Implementation The system implementation was carried out by using PANGEA (Platform for Au- tomatic coNstruction of orGanizations of intElligents Agents) [14], [33]. It is modeled as a virtual organization of agents. These agents are connected with the PANGEA platform [22], a multi-agent platform designed on the basis of virtual organiza- tions[34][35][36], aimed at creating intelligent environments which are able to be connected to any kind of device, as explained in [37]. The system uses a smartphone, which has powerful processing capabilities, provides any functionality needed to con- nect to, and allows the use of its integrated accelerometer to calculate the desired output. The following subsections explains how the system is integrated, paying special at- tention to the UI design, the integration of the system with PANGEA platform – mul- ti-agent design, the design of the hardware to include in the system, and the two algo- rithms used to contrast data. 3.1 Hands Free UI Design: Gesture and Pitch Control Gesture control UI can be developed through the use of a system based on photo detector, camera, or accelerometer. Based on the survey results, a three-axis MEMS accelerometer was used in this study. MEMS sensors are very small, low cost, and fit the system requirements well [12]. A MEMS accelerometer accurately measures acceleration, tilt, shock and vibration in applications. The challenge in designing the pitch control algorithm that uses a MEMS accelerometer output to control pitch contour is to reconcile the numerical ranges between two types of data. MEMS output bytes are integers within the range - 128 to 127 for a range of ±2G. This issue can often be easily reconciled by linear mapping of one range of values (such as MEMS data values - 128 to 127) into another range (such as 67 to 205 expected as the typical male pitch range). Another possible pitch control method is to utilize a pitch contour generation mod- el, such as Fujisaki’s model [11]. The system needs to have a strategy to generate both the phrase component and the accent component from the MEMS output. The F0 template-based method is easier to generate relatively stable pitch contour, however, it may lose some flexibility to generate various pitch patterns. In this study, both the simple linear mapping method and the F0 template-based method were prototyped and examined to evaluate pitch control performance. Also, the comparison study was performed between conventional EL, the linear map-ping method and the F0 template-based method. Figure 1. Message format offered by the sensor agent in PANGEA 3.2. Multiagent Design In order to integrate the system with the PANGEA platform, a virtual organization was developed, named the “alaryngeal talkers organization”. This organization in- cludes the following three kinds of agents, all of which improve the complete system because of the advantages inherent to a multi-agent structure:  Sensor agent: this kind of agent is in charge of obtaining measures from the smartphone’s accelerometer sensor and providing this data when required by other agent members of the system who are authorized to communicate with it.  Config agent: this kind of agent allows establishing certain configuration data, which are required when establishing a pitch frequency to be the base when re- adjusting the frequency. This is an important factor to fit the frequency with a per- son’s physical appearance, which will be estimated from the data entered. With this, an even more natural result is achieved. Figure 2: Message format offered by the configuration agent in PANGEA  Analogic agent: this kind of agent is responsible for generating and providing an analogue output from the data obtained from the agents involved (sensor and con- figuration agents). These agents can now only communicate with each other and with control agents that the PANGEA platform offers, but with the possibility of eliminating this restriction or even expanding the system in future extensions. 3.3. Hardware System Design We have been using Android-based mobile devices. Android is an open-source op- erating system (OS) and has a large market share in terms of OS for smartphones and PC tablets. The basic idea is to utilize an accelerometer of an Android mobile device to control on/off function and pitch frequency. Users can control without seeing the display using sensors. A block diagram of the hardware architecture is shown in Figure 3. An Android mobile device sends PWM signals to a pair of EL transducers through an amplifier. The Amplifier requires a 9V battery so that the EL transducers can generate sufficient speech output. The EL transducer is placed against the neck with the neck-bandage. Figure 4 shows the entire system, including the EL transducer and the amplifier. Figure 3: Block diagram of the Hardware Architecture Figure 4: EL transducer, amplifier and entire system (upper left: EL transducer with neck- bandage, upper right: a pair of EL transducer, lower left: amplifier in a box, lower right: entire system) 3.4. Pitch Control (linear mapping method) Hand gestures are a very important part of language. A preliminary UI study using forearm movement was conducted in order to evaluate the feasibility of the pitch con- trol mechanism. Figure 6 shows the forearm tilt and the MEMS output (x-axis) when the controller was placed on the wrist. The normal pitch control zone extends from the horizontal position (0°) to the 75° upward position. The fading out zone extends from the horizontal position to the -25° downward position, and is where the phrase ending pitch pattern is adjusted based on the forearm moving speed. As for the conversion from the MEMS output to the pitch frequency, there are four pitch ranges. Figure 7 shows the relationship between the MEMS output and the four ranges of pitch fre- quency, i.e. high, mid-high, mid-low, and low. Users can select one of the four rang- es. Figure 5: Forearm Tilt and Pitch Control Figure 6: Relation between MEMS Output and Pitch (linear mapping method) 3.5. Pitch Control (F0 template-based method) The linear mapping method is straightforward approach; however, it requires pre- cise sensor control to avoid unnatural pitch behavior. The F0 template-based method applies a basic F0 template to the fine F0 contour generation. The phrase component of Fujisaki’s model was used to generate the F0 template F0(t). While the system is intended to generate both phrase control and accent control, during the first step of testing the template, we utilized only the phrase component. Ln F0(t) = ln F min + Ap · Gp(t) (1) where Gp(t) = 2 t exp(-t) (2) The symbols in equations (1) and (2) indicate:  Fmin is the minimum value of the speaker’s F0.  Ap is the magnitude of phrase command.  α is natural angular frequency of the phrase control mechanism. Figure 7: Relation between Forearm tilt and F0 template generation (F0 model-based Meth- od) A-zone: -35º~0º B-zone 0º20º, and C-zone: 20º~. In this study, those values are: Fmin = 80Hz, α=1.5 and Ap = 0.75. The calculated F0 template data is stored in the controller software. Figure 7 shows the F0 contour generation mechanism using the MEMS sensor output and the F0 template. The oscil- lation starts at 10° upward from the horizontal position. The template duration is con- trolled based on the forearm tilt angle as shown in the Figure 7. The figure also how to re-start the F0 template is shown. Basically, the forearm movement (C-zone → B- zone → C-zone) is required. A-zone is -35°~0°, B-zone is 0°~20°, and C-zone is 20°~, respectively. 4. Evaluation and Results Although we confirmed the two types of pitch control functions using the mobile device based system, this time, we took an ARM-based hardware unit for the pitch control algorithm evaluation. Subjective evaluation tests (by rating scale method) were made with 2 male well-trained normal speakers, and 10 (one female and nine male) subjects. Each speaker read the phonetically balanced test materials as shown in table 1. We used one commercially available EL device (SECOM EL- X0010), proto- type-A (linear mapping method, with 70Hz mode), and prototype-B (F0 template- based method). Those 60 speech stimuli (2 speakers ∗3 devices ∗10 sentences) were recorded, and two sets of differently randomized stimuli were prepared. 5 subjects evaluated one set of stimuli, and another 5 subjects rated the other set of stimuli. Each speech stimulus was presented two times. Table 1: Phonetically balanced Japanese test sentences The subjects rated the speech stimuli in terms of “intelligibility (Clarity)”, “natu- ralness of the prosody”, and “stability of the prosody” using five level scaling. As shown in Figure 8, the subjective evaluation indicated that both prototype-A(LM) and B(FU) obtained higher naturalness scores than the EL device(EL). On the other hand, intelligibility (clarity) and stability shows almost no difference among those devices. LM FU EL Intelligibility 2.99 2.98 2.98 Naturalness 3.11 3.03 2.255 Stability 3.3 3.17 3.345 Figure 8: Average evaluation scores Without losing intelligibility (clarity) and stability of the prosody, both prototype- A and B showed substantial improvement in terms of the naturalness of the prosody. Results of this study indicate that both, usability and speech quality of EL speakers could be improved by MEMS accelerometer based hands-free UI controller. The abil- ity to control the pitch contour of EL speech with the proposed linear mapping meth- od and F0 template-based method implies that hand gesture control may be adequate for implementation of the hands free user interface for the EL device. Our assumption about the performance difference between the two proposed methods is that the F0 template-based method may be easier to learn and the pitch contour easier to stabilize. However, there was almost no difference between those two methods. We plan to run the same evaluation with actual EL-users, and confirm if the proposed methods per- form similarly. Also, a more detailed and precise study across the talkers, sentences, and learning curve has to be performed. As for the gesture control, we tested only the forearm movement; however, it is necessary to test other body locations where users might be able to control the EL device more easily and naturally. According to the user requirements, the evaluation of appearance also needs to be considered. In the study, we set a relatively narrow pitch range in order to avoid wild swings in pitch. A better pitch control range needs to be investigated. 5. Conclusions An MEMS accelerometer, integrated in a smartphone, hands free UI for EL device was proposed. A hand gesture system was designed and prototyped using a smartphone. Two types of pitch contour generation methods were proposed and tested together with conventional EL device. Results of the evaluation indicated that the proposed methods have a potential to make the EL output prosody more natural, easy to use, and with a less distinct appearance. In addition, the developed multi-agent system provides several advantages. A simple application with multi-profile capacity is achieved, which allows the speaker to obtain an even more natural way of speech. Similarly, the system could be expanded in terms of sensors or even complexity thanks to the characteristics provided by the integration with PANGEA multi-agent platform [13]. It also allows us to keep a record of all messages produced in the sys- tem [14], which can lead to future studies to improve the system based on the gener- ated knowledge. A disadvantage of using the PANGEA platform is that the mobile device must necessarily have a connection with the server, either by local network or the Internet. For a situation where a connection is not possible, there is an alternative design, al- ready developed and presented, which can be seen in [15] However, a more detailed and precise study across the talkers, sentences, and learning curve has to be performed. Acknowledgements. This work has been carried out by the project Sociedades Humano-Agente: Inmersión, Adaptación y Simulación. TIN2012-36586-C03-03. Ministerio de Economía y Competi- tividad (Spain). Project co-financed with FEDER funds. References [1]. SECOM company Ltd., Electrolarynx “MY VOICE”, (http://www.secom.co.jp/personal/medical/myvoice.html) [2]. Griffin laboratories, Tru Tone users guide. [3]. Y. Kikuchi, and H. Kasuya: "Development and evaluation of pitch adjustable electro- larynx", In SP-2004, 761-764, 2004 [4]. H. Takahashi, M. Nakao, T. Ohkusa, Y. Hatamura, Y. Kikuchi, and K. Kaga, 2001. Pitch control with finger pressure for electrolaryngial or intra-mouth vibrating speech.Jp. J. 
Logopedics and Phoniatrics, 42(1), 1-8. [5]. Y. Saikachi, “Development and Perceptual Evaluation of 
Amplitude-Based F0 Con- trol in Electrolarynx Speech”, Journal of Speech, Language, and Hearing Research Vol.52 1360-1369 October 2009 [6]. N. Uemi, T. Ifukube, M. Takahashi and J. Matsushima, “Design of a new electrolar- ynx having a pitch control function”, In Proceedings of 3rd IEEE International Workshop on Robot and Human Communication, RO-MAN p.198-203, Nagoya, Japan, July 18-20, 1994. [7]. K. Nakamura, T. Toda, H. Saruwatari and K. Shikano, “The use of air-pressure sensor in electrolaryngeal speech enhancement”, INTERSPEECH, p.1628-1631, Makuahari, Ja- pan, Sept 26-30, 2010. [8]. A. K. Fuchs and M. Hagmüller, “Learning an Artificial F0- 
Contour for ALT Speech”, INTERSPEECH, Portland, Oregon, 
Sept. 9-13, 2012. [9]. H.L. Kubert, “Electromyographic control of a hands-free 
electrolarynx using neck strap muscles”, J Commun Disord. 
2009 May-Jun;42(3):211-25 [10]. K. Matsui, et al., “Enhancement of Esophageal Speech using 
Formant Synthesis”, Journal of Acoustical Society of Japan (E) 
23,2 pp.66-79, 2002 [11]. H. Fujisaki, In Vocal Physiology: Voice Production, 
Mechanisms and Functions, Raven Press, 1988 [12]. K. Matsui, et al., “A preliminary user interface study of speech 
enhancement sys- tem”, Proc. of the 1st International Conference on Industrial Application Engineering 2013, 53-56 [13]. A. Sánchez et al., “The gateway protocol based on FIPA-ACL for the new agent plat- form PANGEA” 2013. 11th International Conference on Practical Applications of Agents and Multi-Agent Systems. In Trends in Practical Applications of Agents and Multiagent Systems (pp. 41-51). [14]. C. Zato, Villarrubia, G., Sánchez, A., Bajo, J., & Corchado, J. M. (2013). PANGEA: A New Platform for Developing Virtual Organizations of Agents. International Journal of Artificial Intelligence, 11(A13), pp. 93-102. [15]. Kenji Matsui, et al, "Development of Electrolarynx with Hands-Free Prosody Con- trol", The Proc. of the 8th ISCA , pp.273-277, Aug.31, 2013 [16]. S. Poslad, P. Buckle.R. Hadingham, The FIPA-OS agent platform: Open Source for Open Standards. In Procedings of Autonous Agents AGENTS-2000, Barcelona, 2000. [17]. E. Argente, A. Giret, S. Valero, V. Julian, V. Botti, Survey of MAS Methods and Platforms focusing on organizational concepts. In: Vitria, J, Radeva, P and Aguilo, I (ed) Recent Advances in Artificial Intelligence Research and Develop- ment, Frontiers in Artificial Intelligence and Applications: 2004, pp. 309–316 [18]. F.G. McCabe, K.L. Clark. APRIL—Agent PRocess Interaction Language. In Proceedings of the workshop on agent theories, architectures, and languages on Intelligent agents (ECAI-94), Michael J. Wooldridge and Nicholas R. Jennings (Eds.). Springer-Verlag New York, Inc., New York, NY, USA, 1995, 324-340. [19]. A. C. Bicharra García, N. Sanchez-Pi, L. Correia, J. M. Molina (2013). Multi-agent simulations for emergency situations in an airport scenario,. Advances in Distributed Computing and Artificial Intelligence Journal. ISSN ISSN: 2255-2863 [20]. R.H. Bordini, J.F. Hübner, R. Vieira. Jason and the Golden Fleece of agent- oriented programming. In Bordini, R. H., Dastani, M., Dix, J., and El Fallah Seghrouchni, A., eds., Multi-Agent Programming: Languages, Platforms and Ap- plications. Springer-Verlag. chapter 1, 2005, pp. 3-37. [21]. DI Tapia, A Abraham, JM Corchado, RS Alonso. Agents and ambient intel- ligence: case studies. Journal of Ambient Intelligence and Humanized Computing 1 (2), 85-93. 2010. [22]. J Bajo, JM Corchado. Evaluation and monitoring of the air-sea interaction using a CBR-Agents approach. Case-Based Reasoning Research and Develop- ment, 50-62. 2005. [23]. DI Tapia, S Rodríguez, J Bajo, JM Corchado. FUSION@, a SOA-based mul- ti-agent architecture. International Symposium on Distributed Computing and Ar- tificial Intelligence 2008 (DCAI 2008), 99-107. 2008. [24]. S Rodríguez, B Pérez-Lancho, JF De Paz, J Bajo, JM Corchado. Ovamah: Multiagent-based adaptive virtual organizations. Information Fusion, 2009. FUSION'09. 12th International Conference on, 990-997. 2009. [25]. DI Tapia, JF De Paz, S Rodríguez, J Bajo, JM Corchado. Multi-agent system for security control on industrial environments. International Transactions on Sys- tem Science and Applications Journal 4 (3), pp. 222-226. 2008. [26]. S Rodríguez, Y de Paz, J Bajo, JM Corchado. Social-based planning model for multiagent systems. Expert Systems with Applications 38 (10), 13005-13023. 2011. [27]. CI Pinzón, J Bajo, JF De Paz, JM Corchado. S-MAS: An adaptive hierar- chical distributed multi-agent architecture for blocking malicious SOAP messages within Web Services environments. Expert Systems with Applications 38 (5), 5486-5499 [28]. CommonWell Project. (2010). http://commonwell.eu/index.php [29]. Monami project. (2010). http://www.monami.info/ [30]. DISCATEL. (2010). http://www.imsersounifor.org/proyectodiscatel/ [31]. INREDIS. (2011). http://www.inredis.es/ [32]. INCLUTEC. (2011) http://www.idi.aetic.es/evia/es/inicio/contenidos/documentacion/documentacion_grupos_d e_trabajo/contenido.aspx [33]. Zato, C., Villarrubia, G., Sánchez, A., Bajo, J., & Corchado, J.M. (2013). PANGEA: A New Platform for Developing Virtual Organizations of Agents. International Journal of Artificial Intelligence, 11(A13), pp. 93-102 [34]. J Pavon, C Sansores, JJ Gomez-Sanz (2008). Modelling and simulation of social sys- tems with INGENIAS. International Journal of Agent-Oriented Software Engineering 2 (2), 196-221 [35]. S Rodríguez, Y de Paz, J Bajo, JM Corchado. Social-based planning model for multi- agent systems. Expert Systems with Applications 38 (10), 13005-13023. 2011. [36]. F Garijo, JJ Gómes-Sanz, J Pavón, P Massonet (2001). Multi-agent system organiza- tion: An engineering perspective. Pre-Proceeding of the 10th European Workshop on Modeling Autonomous Agents in a Multi-Agent World (MAAMAW’2001) [37]. Carolina Zato, Alejandro Sánchez, Gabriel Villarrubia, Javier Bajo and Sara Rodríguez “Integration of a proximity detection prototype into a VO developed with PANGEA”. 20th International Symposium on Methodologies for Intelligent System (ISMIS 2012), 5th December 2012, Macau (China)