Article 12 :
AMPLITUDE COMPRESSION IN COCHLEAR IMPLANTEES
ARTIFICIALLY RESTRICTS THE PERCEPTION OF TEMPORAL ASYMETRY
C. Lorenzi, S. Gallégo, R.D. Patterson Brit. J.
Audiology, 1998, 32, 367-374
L'objectif de cet article est d'étudier la qualité
du traitement du signal effectué par le processeur vocal Digisonic sur
l'enveloppe temporelle du signal acoustique.
Nous avons comparer les pouvoirs de discrimination des
asymétries temporelles chez les sujets implantés
cochléaires Digisonic dans les conditions suivantes : via le processeur
ou en stimulation directe.
Les résultats montrent une dégradation des pouvoirs
de discrimination due au traitement du signal effectuer par le processeur.
British Journal of Audiology, 1998,32,367-374
Amplitude compression in cochlear implants
artifi-
cially restricts the perception of temporal
asymmetry
Christian Lorenzil, Stéphane Gallégo2
and Roy D. Patterson3
'Laboratoire de Psychologie Expérimentale, Institut de
Psychologie, Paris, France, 2Laboratoire de Physiologie Sensorielle,
Hôpital E. Herriot, Lyon, France and 3Centre for the Neural
Basis of Hearing, University of Cambridge, UK
(Received 6 May 1997, accepted 23 March 1998)
Abstract
This paper presents a study in which five cochlear implantees
were asked to discriminate the timbre of stimuli with temporally asymmetric
envelopes. Stimuli were damped and ramped sinusoids presented acoustically.
They were transformed by the speech processor of the implant and were presented
through one electrode. All cochlear implantees could discriminate the damped
and ramped sinusoids when the half-life was 4 ms, the carrier frequency was 400
Hz, and the period of the envelope was 50 ms. In a second experiment, timbre
discrimination performance was measured as a function of half-life for two
cochlear implantees. Both showed that timbre discrimination was possible over
the range 1-24 ms. In normalhearing listeners, the range is 1-64 ms and in
cochlear implantees, stimulated directly without the speech processor, the
range is 1-300 ms. At long half-lives, the decrease in discrimination
performance observed with the speech processor appears to be due to the
amplitude compression applied by the device. The present results suggest that
it may be important to ensure that cochlear implants do not restrict temporal
asymmetry unduly when applying compression to control level.
Key words: timbre perception, temporal
asymmetry, cochlear implant, amplitude compression, speech processor
Introduction
Two sounds with identical magnitude spectra can have very
different sound quality or timbre. Among the acoustical cues used by listeners
with normal hearing to identify timbre, temporal envelope cues such as onset
and offset transients play an important role (for a review, see Handel, 1995).
In cochlear implantees, temporal envelope cues are coded by changes in the
amplitude and time pattern of stimulation on individual electrodes. Past
studies with single-channel cochlear implants suggested that these changes may
give rise to changes in perceived timbre but the results were not conclusive.
Dobie and Dillier (1985) asked two cochlear implantees to discriminate
triangular and trapezoidal waveforms
Address for correspondence: C. Lorenzi, Laboratoire de
Psychologie Expérimentale, URA CNRS 316. Institut de Psychologie,
Université René Descartes, Paris V. 28, Rue Serpente, 75006
Paris, France
from square waveforms. One cochlear implantee discriminated
these waveforms remarkably well by labelling them `sharp' or `dull'. However,
this labelling was inconsistent between different days of testing. The other
cochlear implantee discriminated the triangular and square waveforms as well as
the first cochlear implantee but was unable to label them.
Changes in the rate of onset (attack) and offset (decay) of a
soundwave may be regarded as changes in the asymmetry of its temporal envelope.
In a series of experiments performed with normal-hearing listeners, Pa tterson
(1994a, b) showed how the effect of temporal asymmetry on the perception of
timbre could be studied systematically using damped and ramped sinusoids. The
term `damped sinusoid' referred to a segment of a sinusoid with a damped
exponential envelope that was repeated cyclically to produce a sustained sound.
The `ramped sinusoid' was
simply the damped sinusoid reversed in time. When the
half-life of the exponential is 4 ms, normal-hearing listeners hear the damped
version as a unitary source (a roll on a drummer's wood block), whereas the
ramped version is heard as a co-ordinated pair of sounds (a roll on a soft
leather table top accompanied by a continuous sinusoid). The effect is
important because the time-reversal affects the temporal envelope of the
soundwave without changing its magnitude spectrum. The results of these
experiments showed that normal-hearing listeners could discriminate the timbre
of a damped sinusoid from that of a ramped sinusoid when the half-life is in
the range 1-50 ms.
The change in perceived timbre elicited by ramped and damped
envelopes has been investigated recently in cochlear implantees (Lorenzi et
al., 1997). In this experiment, ramped and damped current pulse trains were
delivered directly to a single electrode of the implant without going
through the pre-processor of the implant. The results showed that, when the
level of the stimuli is adjusted to fit their audibility range, cochlear
implantees can distinguish between ramped and damped envelopes over a much
wider range (1-300 ms) than normalhearing listeners. Unlike the discrimination
data of Dobie and Dillier (1985), the data of Lorenzi et al. (1997) are highly
consistent across cochlear implantees. The better-than-normal performance of
cochlear implantees indicates that asymmetry in the temporal envelope of a
sound is a powerful cue for timbre identification in these listeners. It also
suggests that, in normal-hearing listeners, cochlear compression limits the
sensitivity to temporal asymmetry. The speech processor used in most implants
includes compression intended to simulate the compression applied by the intact
cochlea. This suggests that the compression in the speech processor degrades
the perception of temporal asymmetry in implantees as it does in normalhearing
listeners.
Method
Five cochlear implantees were asked to discriminate the timbre
of ramped and damped sinusoids when presented acoustically through the speech
processor of a Digisonic DX10 cochlear implant. The results of this experiment
were compared with those obtained previously by direct electrical stimulation
(Lorenzi et al., 1997).
Listeners
Five post-lingually profoundly deaf listeners (BM, DL, LR, RF,
SP) participated in these experiments, three of whom (BM, LR and SP) also
participated in the experiments reported by Lorenzi et al. (1997). They were
all experienced in twointerval, two-alternative forced choice (2I, 2AFC) tasks.
Clinical information about these patients is presented in Table 1. Their
audiometric thresholds at 0.5, 1, 2 and 4 kHz are presented in Table 2. They
were all implanted with a Digisonic DX10 device (MXM), which is a
transcutaneous 15channel cochlear implant with an intracochlear electrode array
(Beliaeff et al., 1994). Stimuli were presented acoustically to cochlear
implantees and were transformed through the speech processor of the Digisonic
DX10 cochlear implant. The device performs a 128-point Fast Fourier Transform
(FFT) from 100 to 7800 Hz. The device imposes an absolute threshold of 40 dB
SPL and it applies logarithmic compression above this threshold, separately in
each channel. The compression device does not include any dynamic elements
(e.g. AGC attack and decay times). The volume control of the processor was
fixed during the testing period. Activation was limited to the most apical
electrode which delivered monophasic (capacitively coupled) current pulses; the
remaining 14 electrodes in the array were connected together to serve as the
return path for the current. More specifically, the return path was a mixture
of 'corn- mon ground' and monopolar modes of stimulation. Radiography revealed
that electrode positioning was roughly the same for all implantees. The
activated electrode was assigned a single wide frequency band (100-7800 Hz),
which combined the 64 energy values produced by the 128-point FFT. For each
patient, the pulse duration (in lis) was adjusted from threshold (MM) to
comfort level (Max). The Min and Max values for each cochlear implantee are
presented in Table 1. The carrier was a train of monophasic (capacitively
coupled) current pulses with a rate of 400 pulses per second which was the
maximum pulse rate provided by the speech processor. For comparison, three
listeners with normal audiometric thresholds also participated. As implantees,
they were highly experienced in 21, 2AFC tasks.
Stimuli
Equation (1) shows the general form of a damped sinusoid:
Table 1. Clinical data for the five cochlear implantees
of the study
Patient
|
Age (years)
|
Cause of deafness
|
Duration of implant use (months)
|
Min (ps)
|
Max (ps)
|
Br '
|
64
|
Head trauma
|
7
|
12
|
55
|
DL
|
30
|
Progressive deafness
|
15
|
86
|
110
|
LR
|
60
|
Unknown
|
3
|
10
|
45
|
RF
|
69
|
Progressive deafness
|
9
|
10
|
41
|
SP
|
44
|
Head trauma
|
6
|
14.5
|
45
|
Table 2. Unaided air-conduction thresholds in dB HL at
the left (L) and right (R) ears of the five impaired listeners of the
study
Frequency (kHz)
|
L
|
0.5
R
|
L
|
1
R
|
L
|
2
R
|
L
|
4
R
|
Patient
|
|
|
|
|
|
|
|
|
BM
|
115
|
115
|
>120
|
>120
|
>120
|
>120
|
>120
|
>120
|
DL
|
105
|
105
|
100
|
120
|
90
|
>120
|
95
|
>120
|
LR
|
115
|
105
|
>120
|
115
|
>120
|
>120
|
>120
|
>120
|
RF
|
110
|
105
|
115
|
>120
|
>120
|
>120
|
>120
|
>120
|
SP
|
105
|
90
|
100
|
100
|
90
|
100
|
95
|
115
|
damp(t) = A exp[c sin[2nft] (0 < t <T)
(1)
where fis the carrier frequency (400 Hz), Ais the
starting amplitude, hl is the half-life of the damped sinusoid, and
c is a constant (-0.693147), that brings the envelope to A/2 in hl
T is the repetition period which is 50 in both experiments. The ramped
sinusoids were produced by reversing the damped sinusoids in time. The stimuli
were digitally generated by a 16-bit D/A converter at a sampling frequency of
44.1 kHz. The duration of the stimuli was 500 its; the silent inter- val
between stimuli was 500 Fts. The stimuli were presented in free field through a
loudspeaker positioned at 0° azimuth and 0° elevation. Listeners sat
at 1 m from the loudspeaker, and were asked to face it during the course of the
experiment. The loudspeaker was a full range (150 Hz to 20 kHz) driver. The
stimuli were presented at a moderately loud level with the damped and ramped
sinusoids having the longest half-life set to 65 dB SPL (SPL was measured with
a sound level meter placed at the listener's head position). As the half-life
decreases, the energy and the loudness of the sound decreases. To maintain the
stimuli at the same loudness, the maximum
amplitude, A, was increased by the square root of 2 each time
the half-life was decreased by a factor of 2.
Segments of the acoustic stimuli are presented in the upper
panels of each section of Fig. 1; the left and right columns show damped and
ramped stimuli, respectively. The half-life of the exponentiel is 1 its in the
top section, 8 ps in the middle section, and 64 Fis in the bottom section. The
trains of stimulation pulses produced by the activated electrode in response to
each sound wave were recorded by a specially designed computer interface
(Digigram system, MXM). They are presented in the panels below each sound wave.
In these panels, the ordinate is in microseconds because the amplitude of the
acoustic stimulus is coded by the duration of the current pulses produced by
the Digisonic DX10 device. The trains of current pulses show that the speech
processor of the implant degrades the temporal envelope of the acoustic
stimulus. When the halflife is 1, 8, or 64 ils, the half-life of the electric
waveform is longer than that of the acoustic waveform. When the half-life is 1
ps or 64 ils, the envelope asymmetry is largely lost. However, asymmetry is
preserved when the half-life is 8 ils.
h1=1 ms hl= l ms
e · 120 -
100 -
· 80
· r1 60- <1.1
· 40 -
· 20
0
20 40 60 40 60 80 100
1.0
hi= 8 ms hl= 8 ms
0.5 0.0
· -0.5
-1.0
e 120
100 -
d
4 · 80
-0 60 - g)
· 40 - a.
20
o
|
if
|
Ift
|
1
|
f
|
1
|
0 20 40 60 40 60 80 100
0 . 4 -I
1
1
1
1
02
le 0.2 o
0.0
no
--0.2 --0.4
1
1
1
1
h1=64 ms
li1111i, ,11
11111
o o
= o
|
120 100 80 60 40 20 0
|
0 20 40 60 40 60 80 100
Time (ms) Time (ms)
Fig. 1. Segments of the acoustic stimuli are presented in
the upper panels of each section. The left and right columns show damped and
ramped sinusoidal waves, respectively. The half-life of the exponential is 1 ps
in the top section, 8 ,us in the middle section, and 64 ps in the bottom
section. The carrier frequency is 400 Hz. The repetition period is 50 ,us. The
train of current pulses produced by the activated electrode in response to each
sound wave is presented in the panel below each sound wave. In these panels,
the ordinate (the pulse duration) is in microseconds.
Cochlear Implantees
Fig. 2. Performance of the five cochlear implantees when
the half-life is 4 ,us. Stimuli were presented acoustically and went through
the speech processor. Two tasks were used to mea- sure discrimination
performance. In the first task (unfilled bars), implantees were asked to say if
the two sounds were identical or different. In the second task (filled bars),
implantees had to choose the interval containing the sound with the more
`drum-like' quality. Performance corresponds to the percentage of damped sounds
as having the more drum-like quality.
cochlear implantees stimulated directly without the use of the
preprocessor (Lorenzi et al., 1997).
In the second experiment, psychometric functions were measured
for cochlear implantees BM and DL using procedure B (`more drum-like sound
quality' task). The half-life was systematically varied from 1 to 32 ms. The
psychometric functions of BM and DL (solid lines with open and filled circles,
respectively) are presented in Fig. 3. In both cases, the shortest
just-discriminable half-life was between 1 and 1.5 and the longest
just-discriminable half-life was between 16 and 24 us. At shorter and longer
half-lives, discrimination performance fell off abruptly. For comparison,
psychometric functions were measured for three normal-hearing listeners with
half-life varying from 0.125 to 128 p. The dotted line with open triangles
shows their mean performance. The dashed line without symbols shows the mean of
the data obtained by Patterson (1994b) and Irino and Patterson (1996) in
similar
Procedure
In the first experiment, two types of discrimination procedure
were used, both of which were 21, 2AFC. In procedure A, listeners were
presented a ramped or a damped sinusoid chosen at random in each interval.
Thus, there were four possible pairs of sounds: ramped/ramped, damped/ damped,
ramped/damped and damped/ramped. They were asked to say if the two sounds were
identical or different. In procedure B, they were presented a damped sinusoid
in one interval and a ramped sinusoid with the same half-life in the other
interval, and asked to choose the interval with the 'more drum-like sound
quality'. Thus, performance corresponds to the percentage of damped sinusoids
chosen as having the more drum-like quality. Procedure B was also used
throughout the second experiment. In the first experiment, the half-life was
fixed at 4 ps. In the second experiment, the half-life was fixed within a block
and was varied from 1 lus to 32 us from block to block. Each block contained 50
trials in the first experiment and 30 trials in the second experiment.
In both experiments, listeners received visual feedback
concerning the accuracy of their response after each trial. Listeners sat in a
double-walled soundproof booth, in front of a keyboard connected to the
computer controlling the experiment. They received 15 min of preliminary
training before participating in each experiment.
Results
The performance of the five cochlear implantees for the first
experiment is presented in Fig. 2; the half-life was fixed at 4 its. Each bar
is based on 50 trials; unfilled bars for procedure A (`same/different' task)
and filled bars for procedure B (`more drum-like sound quality' task). For both
tasks, performance was invariably well above chance: the mean performance of
the five cochlear implantees was 94.4% (SD 7.12%) for procedure A, and 94.8%
(SD 10.54%) for procedure B (p < 0.05 for 61% correct responses).
They all heard the difference between damped and ramped sinusoids and chose the
damped sinusoid as having a stronger drum-like quality without difficulty.
Informal testing with cochlear implantees revealed that the ramped sounds
produced the stronger tonal quality, as is the case for normalhearing listeners
(Patterson, 1994a, b), and for
10
100
1000
100 90 80 70 60 50
40
01
Half-life (ms)
Fig. 3. Psychometric functions for cochlear implantees BM
(solid lines with open circles) and DL (solid fines with filled circles),
showing discrimination performance as a function of half-life. Stimuli were
presented acoustically and went through the speech processor. Implantees had to
choose the interval containing the sound with the more drum-like' quality. The
data from cochlear implantees are plotted along with (1) the mean of the data
obtained with three normal-hearing listeners in identical conditions (dotted
line with open triangles), and (2) the mean of the data obtained with
normal-hearing listeners in similar conditions by Patterson (1994b) and Irino
and Patterson (1996) (dashed line without symbols).
conditions.1 For all normal-hearing listeners,
discrimination performance was above 90% when the half-life was between 1 ms
and 10 ps, but at chance when the half-life was either below 0.125 or above 64
its. In summary, the results of both experiments show that cochlear implantees
receiving stimuli through the Digisonic DX10 speech processor can discriminate
the timbre of ramped and damped sinusoids as well as normalhearing listeners
(when the half-life is 4 its, for instance). However, the results of the second
experiment indicate that discrimination is restricted to a narrower range of
half-lives in cochlear implantees.
'When the half-life was below 1 ps, the normal-hearing
listeners of the present study showed better performance than that reported by
Patterson (1994b). This better performance may be due to the fact that
Patterson (1994b) did not pro- vide feedback after each response, and he mixed
experimental conditions within blocks of trials. It remains, however, unclear
why this methodological difference should affect discrimination performance
under 1 ps, as opposed to performance over 10 ps.
The average psychometric functions for the cochlear implantees
and the normal-hearing listeners who participated in the second experiment are
plotted in Fig. 4 (solid Fines with filled circles for cochlear implantees,
dotted lines without symbols for normal-hearing listeners). The solid lines
with open circles show the mean of the data obtained by Lorenzi et al. (1997)
with cochlear implantees stimulated without the intervention of the speech
processor. Fig. 4 shows that bypassing the speech processor improves timbre
discrimination performance; the effect being stronger at long half-lives than
at short half-lives: the longest justdiscriminable half-life is increased by a
factor of 25 in the direct stimulation mode. The experiments without the speech
processor were performed after those with the speech processor. It is, however,
unlikely that practice effects explain the better performance obtained by
direct electrical stimulation as the two sets of experiments were separated by
a period of 11 months. In addition, all implantees were highly skilled in 21,
2AFC tasks before participating in both experiments.
10 100
1000
100 90 80 70 60 50
40
01
half-life (ms)
Fig. 4. Average psychometric functions for the two
cochlear implantees stimulated with the intervention of the speech processor
(solid fines with filled circles) and the three normal-hearing listeners who
participated in the second experiment of this study (dotted fines without
symbols). The data are plotted along with the mean of the data obtained by
Lorenzi et al. (1997) with cochlear implantees stimulated without the
intervention of the speech processor (solid fines with open circles).
The data show that cochlear implantees using the speech
processor can consistently label and discriminate the timbre of ramped and
damped envelopes, but over a narrower range of half-lives than implantees
stimulated directly with the use the speech processor. These data demonstrate
that the speech processor of the Digisonic DX10 device degrades the envelope
information significantly and affects the timbre of damped and ramped sounds.
The loss of fidelity shown in Fig. 1 by the current pulse functions for a
half-life of 1 us and 64 us suggests that performance in cochlear implantees is
limited by two components of the coding scheme of the Digisonic DX10. At short
half-lives (e.g. 1 us), the persistence of the cochlear implant signal and the
loss of temporal asymmetry are mainly caused by the long duration of the
temporal window (8.2 ils) used for the computation of the 128-point FFT. At
long half-lives (e.g. 64 us), the loss of temporal asymmetry appears to be
mainly due to the compression circuitry of the device.
Conclusions
This paper describes cochlear implantees' ability to
discriminate and label the timbre of sinusoids
with asymmetric temporal envelopes. The stimuli were presented
acoustically and were transformed by the compressive speech processor of the
implant. Stimulation was restricted to a single electrode of the implant.
When the level of the stimuli is adjusted to fit their
audibility range, the implantees are as sensitive to temporal asymmetry as
normal-hearing listeners, but over a narrower range. In other words, timbre
differences elicited by changes in the temporal envelope asymmetry of sounds
are less salient in cochlear implantees using their speech processor than in
normal-hearing listeners. A comparison between the discrimination performance
of cochlear implantees stimulated with and without the use of the pre-processor
suggests that the poorer performance of cochlear implantees stimulated
acoustically is mainly caused by the 128-point FFI' at short half-lives, and by
the compression circuitry of the device at long half-lives.
Temporal asymmetry is a prominent property of speech sounds
and sounds produced by musical instruments. It is known to play a role in
speech discrimination and timbre perception. For instance, much of the
information about the nature of consonants is contained in the 10-40 following
an
onset or preceding an offset, and some of the information is
in the abruptness of the onset or offset (for a review, see Stevens and House,
1972). Hearing-impaired listeners are therefore likely to make good use of
temporal asymmetry when it is available. The present results indicate that it
may be important to ensure that cochlear implants (and hearing aids) do not
restrict temporal asymmetry unduly when applying compression to control level.
Compression is essential for fitting the wide dynamic range of everyday sounds
(around 60 dB) into the limited dynamic range available for electrical
stimulation (less than 10 dB), and the current results should not be
interpreted to suggest that compression should be removed from the speech
processors. Rather, the results emphasize the importance of the
characteristics of the compression to use, that is, the compression
function and the attack and decay times in the case of automatic gain control.
In order to preclude the use of spectral cues, the cochlear implants were used
in a singlechannel mode during the course of these experiments. This
methodological precaution restricts the generality of the current finding. Our
results will therefore need to be extended to multi-channel devices before
being applied in speech processor design.
Acknowledgements
The first and third authors were at the MRC Applied Psychology
Unit (Cambridge, UK) when this research was performed. The first author was
supported by a post-doctoral grant from the FYSSEN Fundation. The second author
was supported by a CIFRE doctoral grant from
MXM Company. We thank Prof Stuart Gatehouse , Patrick Howell,
Catherine Lever and three anonymous reviewers for comments on a previous
version of this manuscript.
References
Beliaeff M, Dubus P, Leveau J M, Repetto J C, Vincent P. Sound
processing and stimulation coding of DIGISONIC DX10 15-channel cochlear
implant. In: Hochmair ES, ed. Advances in Cochlear Implant. 1994; 198-203.
Dobie RA, Dillier N. Some aspects of temporal coding for
single-channel electrical stimulation of the cochlea. Hear Res 1985; 18:
41-55.
Handel S. Timbre perception and auditory object
identification. In: Moore BCJ, ed. Hearing. New York: Academic Press, 1995;
425-61.
Irino T, Patterson RD. Temporal asymmetry in auditory
perception and a 'delta-gamma' theory of asymmetric intensity enhancement in
the peripheral auditory system. J Acoust Soc Am 1996; 99: 2316-31.
Lorenzi C, Gallego S, Patterson RD. Discrimination of temporal
asymmetry in cochlear implantees. J Acoust Soc Am 1997; 102: 482-85.
Patterson RD. The sound of a sinusoid: Spectral models. J
Acoust Soc Am 1994a; 96: 1409-18.
Patterson RD. The sound of a sinusoid: Timeinterval models. J
Acoust Soc Am 1994b; 96: 1419-28.
Stevens K, House AS. Speech perception. In: Tobias J, ed.
Foundations of modem auditory theory. New York: Academic Press, 1972; 1-62.
Conclusion
Lorsque l'on évalue les contraintes psycho-physiques de
chaque patient, on s'aperçoit qu'il y a de grandes disparités
dans la capacité à coder l'information auditive dans ses trois
dimensions le temps, la fréquence et l'amplitude.
Il paraît donc important de connaître les
spécificités psycho-physiques de chaque sujet implanté
pour adapter au mieux l'interface bio-électrique lors du
réglage.
- L'estimation de la fonction de tonie en fonction de
l'électrode stimulée nous permet d'attribuer le plus
fidèlement possible la répartition fréquentielle.
- L'estimation de la fonction de tonie en fonction de la
fréquence de stimulation nous permet d'utiliser la plage de
fréquence de stimulation adéquate pour coder le fondamental
laryngé.
- L'estimation des fonctions de sonie entre chaque canal nous
permet d'adapter l'énergie acoustique à chacune des
électrodes.
- La mesure de la résolution temporelle va permettre de
régler la fréquence de stimulation moyenne.
La mesure de chacun de ces paramètres peut facilement
s'intégrer dans des protocoles rapides lors du réglage, mais ne
peut être effectuée que par des sujets implantés adultes et
très bien conditionnés. L'élaboration de ces tests
psycho-physiques pour une population implantée cochléaire
pédiatrique semble être difficile et trop long. La
possibilité d'utiliser des méthodes objectives comme
l'électrophysiologie pour estimer les contraintes psycho-physiques
semble être prometteuse.
|