This project investigates how eye-brow and head movements (“visual prosody”) interact with speech melody (“verbal prosody”) in order to highlight words in discourse. How do these two modes of human communication together produce different degrees of highlighting (prominence)? Can these be used to mark different types of information?
The melody of speech, or intonation, plays a crucial role in spoken, face-to-face communication. By altering the speech melody, speakers can highlight certain words in a discourse and draw attention to important or new information. Facial gestures, such as eyebrow and head movements, can often have the same function as intonation such as nodding on an important word to make it more prominent. In this project, we investigate how verbal and visual signals work together in conveying different degrees of highlighting. We also study how these signals can be used to code nuances of information structure. For example, in what ways do speakers highlight words differently depending on whether they can be inferred from the context or not, such as talking about a judge in a court (predictable in the context) vs. a judge in a supermarket (not so predictable)?
We are studying two types of ecologically valid speech data – news broadcasts from Swedish television and Swedish spontaneous dialogues – by analysing audio, video, and motion capture data. For the latter, reflectors were attached to selected points on the participants' bodies, who were filmed with special infrared cameras.
In addition, we will study how listeners perceive different combinations of verbal and visual cues. In these perception experiments, we will make use of computer animated agents (“talking heads”) which can be implemented in various speech technology applications, such as dialog systems and virtual tutors.