The stimuli in this module are specifically designed to help you to home in on areas of probable difficulty with the software. While the stimuli are divided into five fairly independent categories, the ensemble provides enough data to perform other analyses. For example, you will have enough tokens of the point vowels [i, u, a] to plot your own vowel space. The variety of stimuli give you the flexibility to analyze the data as you see fit. The answers to questions that may be asked here are not necessarily known--outcomes may be as different as individual voices are.
Stimuli are presented in 5 main categories, with 10 examples (a.-j.) per category. The categories are: 1. Words in carrier phrases, 2. Function words, 3. Vowels in single words, 4. Liquids [l, r], and 5. Continuous speech.
Each of the stimulus types illustrates a general point. The words in carrier phrases appear with both neutral and informative contexts, particularly highlighting potential differences in recognizing the sounds "s" and "sh" when produced by males and females. The function words are spoken both in isolation and in context, with better performance expected for function words appearing in context, due to the addition of information provided by transitions from and to surrounding words. The vowel stimuli allow one to examine whether the duration of a vowel sound is important to identification. Testing with liquid sounds will demonstrate some of the difficulties with sounds with both low-frequency and low-amplitude energy. Finally, the continuous speech paragraph, which focuses on ambiguous word boundaries, will help illustrate some of the broad issues that arise in implementing speech recognition software, as well as providing baseline information on each individual's normative speech. Together, these stimuli are designed to draw attention to a large number of issues and problems in the area of speech recognition--so keep looking and thinking.
Phoneticians often ask subjects to read words in carrier phrases so that the speech will be more natural, instead of a word in isolation. For the purposes of speech recognition, the transitions between words can help disambiguate words. With a very neutral carrier phrase, you can test the behavior of the software when the vague context doesn't provide possible contextual disambiguation. An informative carrier phrase might provide additional information to help the software correctly identify the words by giving additional information about the meaning or cueing a common phrase. Here the question of interest is whether an informative carrier phrase necessarily improves word recognition.
|
|
This set again consists of words that are presented either with or without an informative, semantically related sentence context. In this case, however, function words are involved, highlighting the potential role of semantic context for recognizing speech segments that have little meaning in and of themselves. Function words express the relationship between other words and phrases in a sentence but don't convey much meaning by themselves. Examples of function words are conjunctions like "and" and "or," prepositions like "in" and "of," articles like "a" or "the," and personal pronouns like "I" or "them." Function words are not usually responsible for introducing new information and are not often used contrastively. These words are often unstressed and are usually very short. Their meaning depends heavily on the surrounding content words and they may actually have very show low levels of acoustic energy due to some of the sounds that make them up. The overall result is that function words tend to be less distinctive than content words. Nonetheless, accurate recognition of function words is crucial to speech comprehension, as they are important in understanding sentence structure.
|
|
3. Vowels in single words
For these words, there is little information for the program to rely
on. Diphthongs and tense vowels are longer than lax vowels. Will a
longer vowel lead to increased correct recognition?
|
|
|
4. Liquids [r, l]
Again, this set provides very little information for the program to
rely on. The sounds [l] and [r] have different realizations depending on
their position in a word. How does the software do with these sounds in
various word positions?
|
|
5. Continuous speech
Here, fluent speech is tested, highlighting the challenges posed by
interpreting a continuous signal, rather than differences between male
and female talkers per se. The SR software's ability (or lack thereof)
to disambiguate tricky word junctures in the absence of pauses is the
focus. Target words are underlined.
Here the question of interest is whether the software will be able to use a rather tricky context to figure out what to transcribe. When scoring the outcomes for these stimuli, only count performance on the underlined words; simply ignore the rest of the material though you might like to note what the software transcribes because it might help to understand some of the mistakes. The bracketed words illustrate possible ambiguities which may trick the software.