WOW !! MUCH LOVE ! SO WORLD PEACE !
Fond bitcoin pour l'amélioration du site: 1memzGeKS7CB3ECNkzSn2qHwxU6NZoJ8o
  Dogecoin (tips/pourboires): DCLoo9Dd4qECqpMLurdgGnaoqbftj16Nvp


Home | Publier un mémoire | Une page au hasard

 > 

Contrainte Psycho-Physiques et Electrophysiologiques sur le codage de la stimulation électrique chez les sujets porteurs d'un implant cochléaire

( Télécharger le fichier original )
par Stéphane GALLEGO
Université Lyon I - Doctorat 1999
  

précédent sommaire suivant

Bitcoin is a swarm of cyber hornets serving the goddess of wisdom, feeding on the fire of truth, exponentially growing ever smarter, faster, and stronger behind a wall of encrypted energy

IV/ Evaluation objective de la discrimination phonétique avec l'implant cochléaire Digisonic®

Pour évaluer de manière objective l'intelligibilité du signal émis à la partie interne de l'implant cochléaire Digisonic, trois types de reconnaissance ont été effectués. Dans un premier temps nous avons évalué la reconnaissance obtenue par ordinateur. Celle-ci n'a qu'une valeur indicative sur l'intelligibilité maximale que l'on peut avoir, puisque le mode de reconnaissance des phonèmes est très éloigné de celui pratiqué par les sujets implantés cochléaires. En second lieu, nous avons mesuré l'intelligibilité du signal de l'implant cochléaire obtenu chez des normo-entendants et des surdités moyennes en restituant par un algorithme, le signal de manière acoustique. Pour finir, nous avons mesuré l'intelligibilité obtenue chez des sujets implantés cochléaires.

al Reconnaissance des voyelles par analyse discriminante

Avant de mesurer les performances du sujet implanté cochléaire, il est intéressant d'estimer l'intelligibilité du signal émis par l'implant cochléaire. Pour ce faire nous avons évalué par analyse discriminante (Rouanet et le Roux, 1993), la séparation et le pourcentage de discrimination d'un échantillon de 4 voyelles, puis des 16 voyelles de la langue française (cf figure 8).

Une première étude (C. Berger-Vachon, et al, 1997) a eu quatre objectifs.

1- développer une technique de reconnaissance par une analyse discriminante (test plus performant que la mesure de distance euclidienne),

2- évaluer la séparation et la discrimination des voyelles /a/, /i/, /u/, /3/ via l'implant cochléaire Digisonic®,

3- comparer les données obtenues dans les 2 modes de l'implant cochléaire le mode 'A' - i.e. mode parole - (avec au maximum 6 canaux ouverts par cycle) et en mode 'N' - i.e. mode musique - (avec au maximum 15 canaux ouverts par cycle),

4- comparer les performances effectuées par le traitement de l'implant cochléaire à celles obtenues par un modèle d'implant simulé sur PC.

Les résultats montrent une grande efficacité de l'analyse discriminante dans la classification des voyelles /a/, /i/, /u/, /3/. Le mode 'A' semble être le plus efficace mais il n'est pas significativement différent du mode 'N' et du simulateur d'implant. Les conclusions sont peu affirmatives car le nombre de locuteur est faible (2) et les pourcentages de reconnaissance très proche de 100.

Article 4 :

VOWEL PROCESSING THROUGH A COCHLEAR IMPLANT :
A model of speech coding

J.C. Bera, S. Gallégo, L Collet, C. Berger-Vachon
Advances in Modeling & Analysis, 1999, in press

L'objectif de cette étude a été d'évaluer les séparations d'une population plus importante de voyelles que l'étude précédente (les 16 voyelles de la langue française) via l'implant cochléaire Digisonic® et un modèle simulant l'implant cochléaire.

L'analyse des résultats montrent de très grands pouvoirs discriminants des différentes populations de voyelles (0.94 à 1.00). La classification par un modèle basé sur la FFT est identique voire moins bonne que celle de l'implant Digisonic®.

Le traitement effectué par l'implant apporte donc suffisamment d'informations pour permettre de différencier toutes les voyelles de la langue française par une analyse discriminante.

Vowel Processing Through a Cochlear Implant
A Model of Speech Coding

J.C. Bera *, S. Gallego **, L. Collet***, C. Bergpr-Vachon***

* Acoustic Centre, Ecole Centrale de Lyon, 69131 Ecully-CEDEX (France)
**MXM Laboratories, 06224 Vallauris-CEDEX (France)
***Laboratory « Perception and Auditory Mechanisms »,
ORL Dpt, E. Herriot Hospital, 69437 Lyon CEDEX 03 (France)

Abstract

The construction of an efficient code for deaf people fitted with a Cochlear Implant (CI) is still an op cri problem.

Classically, a spectrum analysis of the acoustical signal is ma :'e and periodieally distributed at the end- of the auditory nerve of a patient. Clinically, this approach has been widely used by the physicians Also, analytical considerations need to be made on the acoustic signal.

In this paper, the results obtained with the discrimination of the French vowels taken at the output of a CI are discussed and compared with a FFT analysis (CI &FFT are two models). Vowels are well separated, two by two, by a discriminant analysis... even those which were expected to be close, using both strategies. The discussion indicates that the eigenvalues or the distances used to assess the separation may not be a very relevant item to simulate the human behaviour in this situation because they separate vowels which are similar to the ear.

Further studies need to be carried out in order to understand better the phenomenon.
Key-words: Cochlear implants, Signal processing, Mathematical strategies, Confusion matrices

Paper presented, and selected, at CCM'98 (Contribution of Cognition to Modelling), International AMSE-Conference, Lyon-France, 6-8 July 1998.

1.Introduction

The recognition of speech by human beings has raised a lot of questions [1,2]. Many models have been established and most of the strategies start with a frequency-time representation of the signal, which is further processed [3,4,5].

This approach is widely accepted for the ear. In the inner ear, it is admitted that the cochlea performs a tonotopic decomposition of the acoustic signal which is distributed at the ends of the auditory nerve. Then, complex mechanisms occur, starting at the brainstem, to end up into an interpretation finally given by the brain.

In the case of totally deaf people, the cochlea is non-functional and it breaks the auditory chain. To beat this handicap, a cochlear implant is surgically introduced and electrodes are put in the cochlear duct. Then, electrical pulses are delivered according to a signal analysis performed by an external device called the speech processor [6].

Mos6 of the studies conducted with patients point out that in difficult situations, the results stay below 70% [7,8,9]. In the experiments the stimulus is embedded in standard contexts such as «hvd» for vowels (where v stands for the vowel) and «aCa» for consonants (where C indicates the consonant). Most of these studies have deepiy analysed die situation with the patient, but suffer of a lack of consideration from the signal processing point of view.

For instance, it is well known that the patient's background has a tremendous influence on his ability to perform die recognition and use efficiently his prosthesis. The evolution with time of die recognition scores has to be taken into account. At die end, it turns out that a deeper look into die intrinsic properties of the signal should bring interesting considerations on what is to be expected from die patient. For instance, it is difficult to imagine that if two stimuli are similar, the patient would separate them in a nonsense context. But, the similarity is something which is not obvious to be defined «mathematically speaking». This is why we will take two approaches to evaluate the distance between «elementary» phonemes.

An objective assessment of the signal processing can be done directly, and compared to the classical FFT analysis.

In the language, vowels are supposed to be stationary and their structure can be represented by a stable vector giving, with its co-ordinates, the distribution of energy in die spectrum. This is a very easy way to see the signal and interesting to make objective studies on the signal.

The questions which are raised in this work are:

- what is the efficiency of the coding of the French vowels by the French cochlear implant (Digisonic DX-10 of MXM) relatively to the classical coding, obtained in the same conditions, with a FFT analysis?

-what is the performance of both systems when vowels, perceptually close, are taken?

- what is the signification of the confusion when human behaviour is considered?

Basically, CI and FFT lead to models of the language. These models are based on the knowledge on speech, in perception and on acoustical properties.

Acoustical analyses will be developed further in this text.

2.Material and methods

2.1 Cochlear implant use

Basically it can be considered that the acoustic wave undergoes the transformations indicated in figure 1, from the acoustic wave to the brain.

Brain Interpretation

Outer & Middle Ear

 
 

Inner

 

Auditory Pathways

 
 
 
 
 
 

Ear

 
 
 
 
 
 
 

Acoustical Signal

Figure 1: Classical stages in hearing.

Outer and middle ears transfer the acoustical vibrations to the inner ear.

In the inner ear, vibrations are converted into electrical stimuli which enter the auditory pathways.

The auditory pathways carry the influx to the brain, and some transformations occur at this stage, mostly in the detection of basic features (temporal and energy) in the signal.

In the brain, important and mysterious transformations take place. Generally it is considered that the features are « printed » in the temporal cortex, then matched with patterns already known by the subject, and finally interpreted according to his background and knowledge of the situation.

In the inner ear there is the organ of Corti which performs the transformation from acoustical vibrations into electrical stimulus. When this organ is totally non functioning (in both ears) the chain is broken, leading to a deep cophosis, and the subject is completely deaf.

In order to beat this handicap, scientists have developed cochlear implants intended to replace this deficient function.

The schematic structure of a cochlear implant is shown on figure 2. The description given is the DX 10 system of the French firm MXM, which was used in this work.

Skin

Micro

Antenna 0

Speech processor

Antenna

Detection & distribution

Electrodes

Figure 2: The main two parts of a cochlear implant.

In the speech processor, the signal is sampled at a 15.6 kHz rate, and a numerical analysis takes place. A FFT is performed and 64 spectrum lins are calculated and grouped into 15 frequency bands ranging (table I) from 122 to 6558 hertz (Hz). Then 15 pulses with a fixed amplitude are constructed. The duration of each pulse is proportional to the energy of the corresponding frequency band. The pulses modulate a 3 MHz carrier which is transmitted, through the skin, to the implanted part of the device.

After reception, the implanted electronic device demodulates the carrier and distributes, sequentially, to 15 electrodes the electrical energy contained in the pulses. According to their

place in the cochlea, the electrodes represent different frequencies... and the brain must deal with them and «guess» the content of the message formulated by the speaker.

Band

Frequency range

1

122-244

2

244-366

3

366-488

4

488-610

5

610-732

6

732-854

7

854-976

8

976-1098

9

1098-1342

10

1342-1708

11

1708-2196

12

2196-2806

13

2806-3660

14

3660-4880

15

4880-6588

Table I: Distribution of the frequency bands (given in Hz) of the Digisonic DX-10 cochlear implant.

2.2 Acoustical material

Twelve French vowels were taken in the following context:

« c'est /v/ ça » (it is /v/ that)

where /v/ stands for the vowel. Vowels are written in phonetic notation and indicated between slashes (table II). The use of a context is intended to minimise the « side effects » (coarticulation influence).

Two French speakers (male and female) participated in the experiment. They were in their mid-twenties.

Bach-vowel was uttered 10 times leacling to 10 samples for each clans:

Phonetic symbol

Typical word

Category

/a/

Pute (paw)

O

/a/

Pâte (pasta)

O

/à/

dans (in)

N

/e/

...té (summer)

O

/ce/

beurre (butter)

O

/8/

baie (bay)

0

/0/

le (the)

O

/i/

fille (girl)

O

lo/

porte (door)

O

/o/

beau (beautiful)

O

/I/

feu (lire)

O

/o/

bon (good)

N

/u/

sol (money)

O

/y/

brûler (to burn)

O

Ice I

brun

N

IÈ/

brin (blade)

N

Table II: French vowels used in this text; the category (Oral or Nasal) is indicated in the right column

2.3 Signal recording

The diagram of the system used to record the signal is indicated on figure 3. The acoustical signal was sampled at a 16 kHz rate by a classical 16-bit sound-blaster card, using the corresponding routines in the Windows package.

The microphone was a low-pass filter (cut-off frequency was 8 kHz) and it had also the antialiasing function. Then the signal was segmented on disc files in order to select only the vowels to be studied.

In nasal phonemes, the air uses the nasal track. When phonemes are oral, the nasal track is closed.

Windows routines

Sound blaster

^-> Personal computer

Disc

storage

Micro 0--

Figure 3: Block diagram indicating the recording of the phonetic material.

2.4 FFT analysis

The FFT analysis was performed on data stored in the computer. An overlap of 50% occurred between two consecutive windows. Samples were weighted according to the Hamming formula. The duration of the analysis window (frame) was 128 points corresponding to 8 milliseconds (ms) of signal.

Sixty-four spectrum fines were calculated and arranged according to the 15 frequency bands of the Digisonic (table I).

On each utterance, 19 frames with a 50% overlap have been taken in the middle of the vowel. Each frame led to a 15-dimensional vector, where each component contained the energy of a frequency band. The final values, for a band and for an utterance, were the average of the energies calculated on the 19 frames.

2.5 Pulses recording

The Digisonic performs a spectrum analysis of the speech signal, and the FFT parameters indicated in section 2.4 were taken for comparison purposes.

A pulse, representing the energy, was associated to each band.

Pulses were automatically recorded using a special device (Digistim) supplied by the manufacturer (figure 4).

The acoustical material to be analysed, previously recorded in the computer, was played at the input of the Digistim (device supplied by MXM) and the duration of the pulses (and the time

between two pulses) were detected and then stored into the computer. This work was done under the control of a special software developed by the manufacturer.

Sound Blaster

Desktop Computer

o

Cochlear Implant

Digistim MXM

Loud speaker

Microphone

Figure 4: Recording of the pulses.

2.6 Discriminant analysis

The discriminant analysis is a classical linear method of classification [10]. Only two-class comparisons were performed in this study. With the 16 classes (one class for each vowel), we had 16* 15/2 =120 comparisons.

Classically, the overlap between two classes is measured by X the eigenvalue of the T-1E matrix, where:

T is the matrix of total covariance,

E is the covariance matrix of the centres of gravity of the classes.

It can be proved [11] that X belongs to the [0,1] range in the 2-class case.

The largest eigenvalue was taken to assess the separation of the classes. Let us remind that X =0 is a « perfect » confusion and X =1 is a « perfect » separation.

In order to make more sensible comparisons, the separation of each pair of two classes has been also indicated, in projection on the main eigenvector associated to the largest eigenvalue.
The results given by the classical statistical formula (given below) have been shown:

- in

S'ab = a b

0. a2 cr 2b

where ma and mb are the means for each class a and b and 6a2 and o the corresponding variances.

3 Results and discussion

Results obtained with the female voice are given (tables III and IV). Those obtained with the male voice are equivalent.

 

/a/

/a/

/a/

/o/

/e/

/s/

/I/

/i/

[É/

/o/

h/ /ce/

/6/

/u/

/y/

/êe' /

/a/

****

07.2

03.7

08.2

09.3

08.9

16.1

10.0

09.1

12.8

06.5

10.1

08.3

07.8

07.3

08.2

/a/

0.95

****

15.9

09.6

13.3

19.8

11.9

16.1

07.1

13.4

08.1

11.8

15.6

12.4

07.7

09.9

/à/

0.99

0.98

****

08.9

10.6

12.4

05.7

12.4

08.7

04.3

09.1

16.8

10.9

09.5

08.2

09.2

/o/

0.99

0.99

0.96

****

09.3

09.0

09.4

11.3

05.3

12.9

07.2

07.2

09.0

08.9

07.5

08.3

/e/

0.99

0.9

0.99

0.99

****

07.4

06.4

12.8

05.7

10.7

06.1

08.9

08.6

08.6

08.6

06.6

/d

0.99

0.99

0.97

0.99

0.99

****

05.3

10.1

09.0

07.1

12.8

08.4

07.7

07.3

15.8

07.2

/I/

0.99

1.00

0.98

0.97

1.00

0.96

****

09.7

11.7

05.4

08.3

07.5

06.6

06.3

08.6

06.3

/i/

1.00

1.00

0.99

0.99

0.99

1.00

0.99

****

14.3

09.0

13.8

08.6

10.6

05.3

12.0

03.7

re

0.96

0.97

0.99

0.98

0.99

0.99

0.99

0.99

****

12.2

06.7

11.1

12.3

12.6

05.1

11.0

/o/

1.00

0.99

0.97

0.99

0.98

0.99

0.97

1.00

0.99

****

08.8

07.6

05.6

05.4

08.4

06.2

/o/

0.99

0.99

0.98

0.96

0.99

0.97

0.96

0.99

0.99

1.00

****

08.0

11.4

10.3

05.4

09.0

/ce/

0.99

1.00

0.99

0.99

0.99

0.99

0.99

1.00

0.99

1.00

1.00

****

06.2

07.0

09.1

07.4

/6/

1.00

1.00

1.00

0.99

1.00

0.99

0.99

1.00

0.99

0.95

0.99

0.99

****

08.0

09.1

08.1

/u/

1.00

1.00

0.99

0.99

0.99

0.99

1.00

0.97

1.00

0.99

0.99

0.97

0.99

****

08.1

03.0

/y/

1.00

0.99

0.99

0.96

1.00

0.98

1.00

1.00

0.99

0.99

0.99

1.00

0.98

0.99

****

08.8

/Ce/

1.00

1.00

0.99

0.99

0.99

0.99

0.99

0.97

1.00

0.99

1.00

0.99

0.99

0.92

1.00

****

Table III : Separation between the vowels using the FFT; Xs are on the bottom left and Sabs on the top right.

Results were calculated with the FFT model and with the cochlear implant coding. 2,s are indicated on the bottom left and Sabs on the top right of the tables. It can be seen that the vowels

were well separated in all the situations. Both models (FFT and Implants) of vowel representations behaved equally.

Then, it was expected that some vowels would have a high overlap. This is the case for /Ce/ and /V which are not easy to distinguish by normal listeners in usual speech. In our example, the automatic separation is almost perfect.

 

/a /

/a/

/ /o / /e/

/s/

/I/

/i/

/"Ê / /o/ bo / /ce/ /ô/

/u/

/y/

/Ce/

/a /

****

15.0

08.4

13.9

10.9

19.7

06.7

18.6

13.6

06.8

10.9

17.6

14.7

12.2

20.1

14.8

/a/

1.00

****

14.9

05.4

29.2

15.8

11.7

29.4

07.9

24.0

09.3

11.7

33.4

26.1

24 6

08.8

lij /

1.00

0.99

****

14.2

14.4

12.1

07.1

22.2

08.7

05.1

16.0

10.9

07.5

13.0

17.4

08.1

/a /

0.99

0.98

0.99

****

20.5

10.2

11.2

19.9

05.2

28.1

12.5

05.7

19.6

18.2

17.3

06.1

/e/

0.99

1.00

0.99

1.00

****

13.2

13.3

11.0

11.2

09.2

15.5

11.3

04.0

06.5

09.3

20.5

/s/

0.99

0.99

0.99

0.99

1.00

****

06.8

18.3

07.2

18.1

15.4

10.7

15.4

13.8

11.8

18.8

/V

0.99

0.99

0.99

0.99

0.99

0.96

****

22.0

06.8

04.2

08.7

08.3

16.7

14.7

17.3

08.3

/i/

1.00

1.00

1.00

1.00

0.97

1.00

1.00

****

22.3

19.8

23.9

11.8

13.1

05.8

07.2

25.4

ft' /

1.00

0.99

0.99

0.95

1.00

0.98

0.97

1.00

****

10.2

07.0

04.2

12.8

17.6

18.0

04.7

/o/

1.00

1.00

0.97

1.00

1.00

0.99

0.96

1.00

0.97

****

09.6

15.3

07.2

11.6

19.3

13.8

bo /

0.99

0.98

0.99

0.99

1.00

0.99

0.99

1.00

0.97

0.98

****

11.4

14.0

17.4

19.0

05.2

le /

1.00

0.99

0.99

0.98

0.99

0.99

0.98

0.99

0.97

0.98

0.98

****

11.7

11.1

09.9

08.4

/5/

0.99

1.00

1.00

1.00

0.97

1.00

1.00

0.97

1.00

0.99

1.00

1.00

****

07.5

12.0

18.5

/u/

0.98

1.00

0.99

0.99

0.98

0.99

1.00

0.95

1.00

0.99

0.99

0.99

0.95

****

05.4

17.3

/y/

1.00

1.00

0.99

1.00

0.98

0.98

0.99

0.94

1.00

1.00

0.98

0.98

0.98

0.95

****

16.0

/Ce/

1.00

1.00

0.99

1.00

0.99

0.99

0.99

1.00

0.93

0.99

0.94

0.99

1.00

1.00

0.99

****

Table IV: Separation using the cochlear implant. Considering these results two comments can be made:

1) The vowels were carefully spoken in good signal to noise ratio conditions (soundproof room); it would be interesting to see the results in more « natural » (noisy) situations.

Comparisons were made on the voice of one speaker only... in the case of several speakers, it would be useful to reconsider the situation.

2) The representation which was taken (vectors and eigenvalues in a discriminant analysis)

did not appear to be a good model of human behaviour with the vowels. People are more used to cope with the variability which is included in normal speech conditions.

Further studies are needed, in order to investigate deeper that matter. For instance, a look at the structure of the representative vectors cannot be avoided. A model of human behaviour in such conditions is still to be made and the spectral distribution presented in this work must be improved to match better human psycho-acoustical performances.

IV Conclusion

Vowels representation by a cochlear implant, using the model of a spectrum vector and a discriminant analysis (or a statistical distance), is very efficient to separate the vowels.

This efficiency was surprisingly high because several vowels (acoustical objects) which are similar perceptually were perfectly separated.

Results given by a FFT analysis were very similar to those obtained with a cochlear implant.

This work indicates that further studies are needed to set up a model more adapted to human behaviour.

Acknowledgements

The authors acknowledge the participation in this experiment of the students involved in the project: M. Doutiaux, S. Faucher, B. Fuselier, E. Laribe, D. Rousseau.

References

[1] Fant R. « Auditory patterns of speech », W. Dunn edit., Cambridge, M.I.T. Press, (1967), 111-125.

[2] Roberts L. « Speech and brain mechanisms », Princeton Univ. Press, (1959).

[3] Berger-Vachon C., Gallego S., Bera J.C.,.Arnoux E., Vissac C. «A Model of vowels representation using a cochlear implant », Advances in Intelligent Systems, 526-532 Ed. F.C. Morabito, I. 0 . S . Press, Amsterdam , (1997)

[4] Perkell J., Klatt D.H. « Invariance and invariability in speech processes », Lawrence Earlbaum, Hullsdale N.J., USA, (1983).

[5] Serniclaes W., De Guchteneere R., Secqueville T., Bachelot G., Genin J., Meyer B., Chouard C.H. « Objective evaluation of vowel identification with the Digisonic cochlear implant », Audiology, 35, 23-36 (1996).

[6] Belaieff M., Dubus P., Leveau J.M., Repetto J.C., Vincent P. « Sound processing and stimulation coding of Digisonic DX-10 15-channel implant », Advances in cochlear implantation, Ed I.J. Hochmair (Innsbruck), 198-203 (1996).

[7] Robinson K., Summerfield A. « Adult auditory learning and training », Ear & Hearing, 51S- 65S (1996).

[8] Dorman M.F., Loizou P.C. « The identification of consonants and vowels by cochlear implant patients », Ear & Hearing, 162-166 (1998).

[9] Skimmer M.W. et al « Speech recognition at stimulated soft, conversational and raised to loud vocal efforts by adults with cochlear implants », J. Acoust. Soc. Am., 101, 3766-3782 (1997).

[10] Sebestyen G.S. « Decision making processes in Pattern recognition », Mc Millan (1962).

[11] Lebard L., Morineau A., Fénelon J.P. « Traitement des données statistiques », Dunod-Paris (1982).

b/ Restitution acoustique des signaux provenant de l'électrodogramme

Précédemment, nous avons montré que l'information résultant de l'implant cochléaire Digisonic® est très pertinente. La question maintenant posée est de savoir si la manière dont l'information est transmise au système auditif nécessite un recodage. En effet, l'information, très pertinente pour l'ordinateur, peut être mal décryptée par les voies auditives.

Il nous a paru intéressant de simuler le signal entendu par les sujets implantés cochléaires, par une restitution acoustique des électrodogrammes, afin d'évaluer son intelligibilité lors d'un traitement normal via les voies auditives.

précédent sommaire suivant






Bitcoin is a swarm of cyber hornets serving the goddess of wisdom, feeding on the fire of truth, exponentially growing ever smarter, faster, and stronger behind a wall of encrypted energy








"Le doute est le commencement de la sagesse"   Aristote