UCT CS Research Document Archive

Speech Perception in Virtual Environments

Verwey, Johan (2006) Speech Perception in Virtual Environments. MSc, Department of Computer Science, University of Cape Town.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.


Many virtual environments like interactive computer games, educational software or training simulations make use of speech to convey important information to the user. These applications typically present a combination of background music, sound effects, ambient sounds and dialog simultaneously to create a rich auditory environment. Since interactive virtual environments allow users to roam freely among different sound producing objects, sound designers do not always have exact control over what sounds a user will perceive at any given time. This dissertation investigates factors that influence the perception of speech in virtual environments under adverse listening conditions.

A virtual environment was created to study hearing performance under different audio-visual conditions. The two main areas of investigation were the contribution of "spatial unmasking" and lip animation to speech perception. Spatial unmasking refers to the hearing benefit achieved when the target sound and masking sound are presented from different locations. Both auditory and visual factors influencing speech perception were considered.

The capability of modern sound hardware to produce a spatial release from masking using real-time 3D sound spatialization was compared with the pre-computed method of creating spatialized sound. It was found that spatial unmasking could be achieved when using a modern consumer 3D sound card and either a headphone or surround sound speaker display. Surprisingly, masking was less effective when using real-time sound spatialization and subjects achieved better hearing performance than when the pre-computed method was used.

Most research on the spatial unmasking of speech has been conducted in pure auditory environments. The influence of an additional visual cue was first investigated to determine whether this provided any benefit. No difference in hearing performance was observed when visible objects were presented at the same location as the auditory stimuli.

Because of inherent limitations of display devices, the auditory and visual environments are often not perfectly aligned, causing a sound-producing object to be seen at a different location from where it is heard. The influence of audio-visual integration between the conflicting spatial information was investigated to see whether it had any influence on the spatial unmasking of speech in noise. No significant difference in speech perception was found regardless of whether visual stimuli was presented at the correct location matching the auditory position, at a spatially disparate location from the auditory source.

Lastly the influence of rudimentary lip animation on speech perception was investigated. The results showed that correct lip animations significantly contribute to speech perception. It was also found that incorrect lip animation could result in worse performance than when no lip animation is used at all.

The main conclusions from this research are: That the 3D sound capabilities of modern sound hardware can and should be used in virtual environments to present speech; Perfectly align auditory and visual environments are not very important for speech perception; Even rudimentary lip animation can enhance speech perception in virtual environments.

EPrint Type:Electronic Thesis or Dissertation
Keywords:Virtual Environment, Speech perception, sound spatialization
ID Code:393
Deposited By:Blake, Edwin H
Deposited On:30 April 2007