Memoire Online - Contrainte Psycho-Physiques et Electrophysiologiques sur le codage de la stimulation électrique chez les sujets porteurs d'un implant cochléaire

L'objectif de cette étude est de modéliser, en utilisant la logique flou, la reconnaissance des voyelles /a/, /i/, /u/, /3/ de 4 sujets porteurs d'un implant cochléaire Nucleus de Cochlear. Différents modèles de reconnaissance n'utilisant qu'une partie de l'information transmise au patient via l'implant cochléaire sont expérimentés (255 modèles).

10 locuteurs (5 hommes et 5 femmes) ont été enregistrés. 48 items par locuteur ont été prononcés. Chaque item est présenté en parallèle au sujet implanté cochléaire et via l'implant cochléaire à une carte d'acquisition permettant la reconnaissance par ordinateur.

Pour chaque patient, une comparaison entre la matrice de confusion et celle trouvée par ordinateur détermine le meilleur modèle caractérisant sa reconnaissance.

Les résultats montrent que le modèle représentant le mieux la compréhension varie selon le patient. Certains sujets utilisent la tonotopie, d'autres n'utilisent que l'énergie.

Il est donc tout aussi important pour certain sujet de préserver l'enveloppe temporelle du signal que l'information fréquentielle.

ANALYTIC IMPORTANCE OF THE CODING FEATURES FOR THE DISCRIMINATION OF VOWELS IN THE COCHLEAR IMPLANT SIGNAL

From the Department of Otorhinoleryngology, Edouard I Terriot Hospital, Lyon, France.

The présent study considers the analytic importance of the excitation pulse features delivered by a Nucleus cochlear implant using the FOF 1 F2 strategy. Four cochlear implant patients and 10 speakers uttering two 48-item lists constructed from four basic French vowels participatcd in this study. Patterns were presented to the patients and played at the input of an acquisition system in order to record the pulse features. Confusion matrices obtained with the patients and with automatic recognition procedures were then compared in order to find out the best-matching models simulating the patients' performances, out of 255 possibilitics. The automatic recognition was carried out according to fuzzy logic based on the elementary features of the pulses coding the vowels. Results show that the essential features strongly depended on the subject.

Coding of the acoustic signal by means of a cochlear implant (CI) is still under discussion.¹ While several strategies have bcen used, results based only on cl inical performances do not show with enough precision what elementary features are important in the coding for speech recognition. Basically, the stratégies commonly used in a multichannel CI involve the phonetic aspect of language (Nucleus),² the use of a spectrum (Digisonic),³ and the analog splitting up of energy such as the one produced by Symbion.⁴ Another step takes place in experiments in which the signal is artificially modified and presented to the patient in order to establish whether or not changes in the coding are significant.^5-7

Fig I. Structure and parameters of stimulating pulse on any given electrode (E, in text).

tients, vowels with high energy are not confused with lowenergy vowels, even when their frequency configuration is similar. For example, the high energy of the French vowel /a/ makes it quite easy to recognize. The question that emcrged from this finding is the following: "What characteristics in the signal coded by the speech processor allow for the acoustic distinction by the patient?"

Computer simulation allows the testing of a large number of combinations of elementary signal features with the help of models for determining the most important features for the patient. The study presented here aims to find out what is

TABLE 1. MOST LIKELY VALUES FOR FEATURES ESTABLISIIED
DURING LEARNING STAGE (CORRESPONDING TO ONE
SPEAKER AND ONE PATIENT)

	/a/	li/	lu/	le/
El	19	21	21	20
E2	16	9	18	16
Al	11	22	26	5
A2	24	11	20	13
Dl	26	7	8	24
D2	17	29	6	15
Cl	29	4	4	26
C2	27	30	2	25


Magnetic	CI	Acquisitior	Desktop
Tape	_ Speech			-- Storage
Recorder	Processor	Caïd	Computer

important in the coding that allows distinction between vowels. Some work already done by the Melbourne team⁸ established that the second formant (F2) and the F1F2 representation were important in the patient's recognition process. This testing could be extended by using theoretic models. A more analytic study of the features of the stimulation pulse is also possible. Models can be created to evaluate the contribution of each feature of those elements that play a role in the distinction perceived by the patient. This is the aim of our work.

The four patients who collaborated in this work all used the Nucleus 22 cochlear implant, and the Mini Speech Processor (MSP) programmed with the FOF1F2 strategy and in the bipolar plus one mode. They were two women, one man, and a young girl. The FOF1F2 strategy was chosen in order to limit the number of features studied in this experiment.

The man (CO), 46 years of age, had all his electrodes working. He became deaf at the age of 41. He had 20 channels active and was a star patient. The first woman (BA), 40 years of age, also had all her electrodes working and 20 channels active. Deafness occurred at the age of 2. The second woman (LA), 32 ycars old, had an ossificd cochlea with only 5 channels active (15 to 20; 18 was nonfunctioning). She became deaf at the age of 28. The girl (AM), 12 years old, had 19 channels active (channel 7 was closed). She was close to being a star patient. Deafness occurred at the age of 9.

In all four cases, the channels covered the frequency range from 280 Hz to 4 kHz; two fifths were in the Fl range (280 to 800 Hz) and the others were in the F2-F3 range (800 to 4,000 Hz). The band-pass filter was distributed according to a logarithmic scale.

Speakers. Ten staff members, five men and Pive women, collaborated in ibis work by reading the acoustic material. They were from 20 to 30 years old and had clear voices. Two lists were read by each speaker, the first for learning and the second for recognition.

Vowels. Four French non-nasal vowels were chosen. They were situated at the points and in the middle of the vowel triangle in the F1F2 space.

and F2 2,000 Hz; vowel /u/, F1300 Hz and F2 800 Hz; vowel F1 650 Hz and F2 1,250 Hz; and vowel /e/, F1 500 Hz and F2 1,500 Hz.

These vowels were well separated in the F1F2 space. Each vowel was embedded in a sentence: "c'est /v/ ça," with /v/ standing for the vowel. Each vowel was repeated 12 times at random to produce two 48-vowel lists.

Patient. Each patient was asked to recognize the vowels spoken by each speaker. Confusion matrices were then estab-

lished. The patient first listened to the training list in order to adapt his or her discriminating possibilitics. Then, for each utterance of the recognition list, he or she was asked to give his or her best choice for the vowel. A confusion rnatrix was constructed for each speaker and for each patient, leading to a total of 40 matrices.

Computer. A previous study⁹ showed that fuzzy logic, close to a probabilistic dccision, was well adapted to simulate the patient's recognition of the vowels. Let us in troduce this method by supposing that k features are studied for each vowel. The recognition process can be broken into two stages. During the learning stage, a table is filled out which records, for each value of the feature, the number of occurrences corresponding to each class. Ranges have been normal ized from 1 to 32 for each item, and only integer values were taken. In each box of this table, there is a histogram showing the occurrence of the 32 values. This histogram was established in order to indicate the "probability" of each of the 32 values. The CI mapping was adapted to each patient and a table constructed for each implantce and each speaker. Consequently, a full table contains 32 (values) x 4 (vowels) x 8 (features) = 1,024 numbcrs.

The features are dcscribed in the electrode number (E), the amplitude (A), the duration (D), and the charge (C) (Fig 1). An example of the most likely values, for each feature and for each vowel, is given in Table 1.

At the recognition stage, an "unknown" pattern "x" needed to be classified. This pattern was represented by eight values (one for each feature). For each feature, a score was attributed to each class. This score was obtained from Table 1. The sum of the scores was calculated for each vowel and x was classitied to the closest vowel (having the highest sum). When all the vowels of the recognition list were classified, a confusion matrix was established characterizing the automatic recognition.

Score of Model. A Hamming distance was constructed between the confusion matrix observed with the patient, and the confusion matrix of the automatic recognition (each automatic recognition corresponded to a model). The sum of the absolute values of the difference, calculated box by box between the two matrices, gave us the score of the model.

Signal Acquisition. The signal corning from the speech processor was fed into a computer. In order to keep the same signal (for the patients and for the automatic recognition), the lists were recorded on a high-fidelity Revox tape recorder. The signal, transformed by the processor, was taken by an acquisition card designed for this tank. The processor was set according to the patient's map values. The system worked under the control of the computer. Last, data were kept on disks (Fig 2).

It should be kept in mind that for each stimulating pulse, representing a formant, the Nucleus device delivers six elementary pulses bearing the information of the electrical stimulation. Out of these six elementary-pulse sets, it is possible to extract the features of the stimulating pulse (Fig 1): E, A,

Set of Features. To facilitate the analytic study, the Nucleus system was used according to the simple FOF1F2 strategy, and only the information on the voice formants was considered. For each pitch period during the utterance of a vowel, the Nucleus delivers two stimulating pulses (one foreach formant) containing the following information (eight features): El E2

A 1 A2 D 1 D2 Cl C2, with 1 and 2 referring to the pulse. Thus, 255 recognition spaces can bc constructed with these features. Each space (corresponding to a model) has a base that combines these eight features according to the combinative analysis.

As the aim of the work is to cstimate which features arc likely to bc used by CI subjects in the recognition process, a set of features received a high score when its confusion matrix was close to the confusion given by the patient ("bestmatching" model). Models were ranked according to this proximity. Results are given in Table 2 averaging the 10 speakers. Thcy are grouped according to the number of features.

Results showed that the recognition given by the models with a single parameter did not always put the tonotopic information of the second formant (E2) in top best-matching position. When two parameters were used, the ElE2 combination was not systematically the bcst. Three times out of four, the best-matching model was based on the first formant properties only (number and energy).

When a third parameter was added, best-matching models took information mostly from the two formants. It is worth noting that thebest matching mode I changedfrom one patient to another. The settings of a speech processor should take into account the patient's recognition strategy in some way. This is now possible with the present CI versatility.

Schematically, we suggest the following interpretation of the patients' results. For the star patient, CO, the results corresponded to the tonotopic representation. For patient BA (prelingual deafness), the first formant was mostly used, and the patient did not take full advantage of the tonotopic representation. Patient LA, with only a small number of electrodes, made excellent use of the information given by the charge. Patient AM (good performer) had a tendency to take data from Fl and F2, which was not specifically the electrode position.

It could be interesting in the future to generate the pulses on a speech processor simulator, and to test directly, with the patients, the best combinations given by the models.

This work considers, through the use of models, the importance of some features of the stimulating pulse of a Nucleus speech processor. This was done with CI patients using a corpus of four French vowels. The main results can be summarized thus. The second formant position was not always the best strategy for making the distinction. Data on the first formant (including the charge) were also important. The classic phonetics model E1E2 was not always the bestmatching model. Again, data on the energy turncd out to be equa]ly important. Relevant features differed from one patient to another, suggesting that a strategy should be adapted to each subject. These results need to be confirmed by testing in direct stimulation with the patients.

ACKNOWIF_DGME^,;TS -- The authors thank people and institutions that supporter/ thcir u
·ork: the Civil Hospitals of Lyon, the French Council for Research, the Rhône-Alpes Region, the API company, the University of Lyon, and Professor L. Collet from the Edouard Herriot Hospital.

Contrainte Psycho-Physiques et Electrophysiologiques sur le codage de la stimulation électrique chez les sujets porteurs d'un implant cochléaire

Article 3 :

ANALYTIC IMPORTANCE OF THE CODING FEATURES FOR THE
DISCRIMINATION OF VOWELS IN THE COCHLEAR IMPLANT SIGNAL
C. Berger-Vachon, S. Gallego, A. Morgon, E. Truy

Contrainte Psycho-Physiques et Electrophysiologiques sur le codage de la stimulation électrique chez les sujets porteurs d'un implant cochléaire

Article 3 :

ANALYTIC IMPORTANCE OF THE CODING FEATURES FOR THEDISCRIMINATION OF VOWELS IN THE COCHLEAR IMPLANT SIGNALC. Berger-Vachon, S. Gallego, A. Morgon, E. Truy

ANALYTIC IMPORTANCE OF THE CODING FEATURES FOR THE
DISCRIMINATION OF VOWELS IN THE COCHLEAR IMPLANT SIGNAL
C. Berger-Vachon, S. Gallego, A. Morgon, E. Truy