VACC


The Voice Assistant Conversation Corpus

Related Paper: Siegert, Ingo; Krüger, Julia; Egorow, Olga; Nietzold, Jannik; Heinemann, Ralph; Lotz, Alicia Flores Voice Assistant Conversation Corpus (VACC) - a multi-scenario dataset for addressee detection in human-computer-interaction using Amazon's ALEXA In: Proceedings of the LREC 2018 Workshop LB-ILR2018 and MMC2018 Joint Workshop, 7 May 2018, Miyazaki, Japan - Paris, ELRA, p. 51-54

Brief Information

Collected Data

Recording Setup

Content of the Corpus

Participant's characterization


Brief Information

The Voice Assistant Conversatio Corpus (VACC) is a new type of conversation corpus in the field of human-computer interaction. It was created in collaboration with the Cognitive Systems Group and the University Clinic for Psychosomatic Medicine and Psychotherapy.

The main goal of the data set is to enable studies on human-machine interaction. The aim was to vary certain boundary conditions:

Both tasks are performed once alone and once together with an interaction partner. This partner only interacts with the test person and not with ALEXA.

Ablauf

Collected Data

  1. Audio recordings
    • from the participant
    • from the interlocutor and
    • from the overall scene
  2. Questionnaires
    • Socio-demographic information (before the experiment)
    • Experiences with technical systems in general (before the experiment)
    • Perception of ALEXA and the interlocutor (after the experiment)
    • Changes in voice and speech behavior during interaction (after experiment)

Recording Setup

The recordings took place at the Institute of Information and Communication Engineering. They were conducted in a living-room-like surrounding. The aim of this setting was to enable the participant to get into a natural communication atmosphere (in contrast to the distraction of laboratory surroundings).

As voice assistant system, we used the Amazon ALEXA Echo Dot (2nd generation). We opted for a commercial system to allow a fully free interaction with a currently available system.

The recordings were conducted using two high-quality neckband microphones (Sennheiser HSP 2-EW-3) to capture the voices of the participant and the interlocutor as well as one high-quality shotgun microphone (Sennheiser ME 66) to capture the overall acoustic scene and especially the output of the voice assistant. The recordings were stored uncompressed in WAV-format with 44.1 kHz sample rate and 16 bit resolution.

Content of the Corpus

Participants 27 German-speaking students
Distribution of Sex 13 men 14 women
Distribution of Age MW 24 years STD 3.32 years Min: 20, Max: 32 years
Total amount of data 17 hours
Mean duration per dialog 31 minutes
Annotation utterances, type of utterances, transcripts, laughter, discourse particles
Collected Questionnaires evaluation of interaction and speaking style, experiences in interacting with voice assistants, socio-demographic data, AttrakDiff

Participa