Human and automated audio description for media accessibility

Aktivitet: Föredrag eller presentationPresentation


Audio description provides access to audiovisual materials for people with visual impairments and blindness (BVI), offering a richer and more detailed understanding and experience of film and video. The task of the audio describer is to select relevant information from the visual scene (environments, objects, characters, their appearance, facial expressions, gestures and body movements, action) and verbalise it by using vivid descriptions in order to evoke inner images for the target audiences (Holsanova et al. 2020). The audio describer must consider which information is conveyed by the sounds, music and dialogues (and can be perceived by the BVI audiences) and which information is expressed only visually (and cannot be perceived by the BVI audiences) in order to decide what needs to be described from the visual scene, how and when (Holsanova 2022). In my presentation, I will illustrate some of the important activities of an audio describer and summarise the challenges. On the basis of the results from the MeMAD project (Braun et al. 2020, Starr et al 2020), I will then compare the performance of a human audio describer with the computer generated video descriptions and illustrate what today's automatic systems can handle and what they have difficulties with. For instance, visual saliency and narrative relevance, contextualisation and inferential capacity, coherence and temporal and narrative continuity. Finally, I will suggest that a model of event segmentation (the human ability to conceive the boundaries of when a narrative event starts and ends and what it contains) could be used to achieve a more human-like automated video description.
Period2022 apr. 20
EvenemangstitelAI Lund lunch seminar
Typ av evenemangSeminarium
PlatsLund, SverigeVisa på karta

Fria nyckelord

  • accessibility
  • audio description
  • audio-visual content
  • human and computer-generated video description
  • action-based description
  • event segmentation
  • visual saliency and narrative relevance
  • contextualisation
  • inferential capacity
  • coherence
  • pronominalisation
  • temporal and narrative continuity