VACC
The Voice Assistant Conversation Corpus
Related Paper: Siegert, Ingo; Krüger, Julia; Egorow, Olga; Nietzold, Jannik; Heinemann, Ralph; Lotz, Alicia Flores Voice Assistant Conversation Corpus (VACC) - a multi-scenario dataset for addressee detection in human-computer-interaction using Amazon's ALEXA In: Proceedings of the LREC 2018 Workshop LB-ILR2018 and MMC2018 Joint Workshop, 7 May 2018, Miyazaki, Japan - Paris, ELRA, p. 51-54
Brief Information
Collected Data
Recording Setup
Content of the Corpus
Participant's characterization
Related Publications
Brief Information
The Voice Assistant Conversatio Corpus (VACC) is a new type of conversation corpus in the field of human-computer interaction. It was created in collaboration with the Cognitive Systems Group and the University Clinic for Psychosomatic Medicine and Psychotherapy.
The main goal of the data set is to enable studies on human-machine interaction. The aim was to vary certain boundary conditions:
- The type of dialog (formal, informal)
- The number of interaction partners (human-machine vs. human-human-machine) The interaction was conducted using a commercial language assistance system (Amazon's ALEXA). The participant was given two different tasks, one after the other. First, dates for exercises have to be determined, whereby the participant's calendar can only be requested via ALEXA. Then quiz questions have to be solved with the help of ALEXA.
Both tasks are performed once alone and once together with an interaction partner. This partner only interacts with the test person and not with ALEXA.

Collected Data
- Audio recordings
- from the participant
- from the interlocutor and
- from the overall scene
- Questionnaires
- Socio-demographic information (before the experiment)
- Experiences with technical systems in general (before the experiment)
- Perception of ALEXA and the interlocutor (after the experiment)
- Changes in voice and speech behavior during interaction (after experiment)
Recording Setup
The recordings took place at the Institute of Information and Communication Engineering. They were conducted in a living-room-like surrounding. The aim of this setting was to enable the participant to get into a natural communication atmosphere (in contrast to the distraction of laboratory surroundings).
As voice assistant system, we used the Amazon ALEXA Echo Dot (2nd generation). We opted for a commercial system to allow a fully free interaction with a currently available system.
The recordings were conducted using two high-quality neckband microphones (Sennheiser HSP 2-EW-3) to capture the voices of the participant and the interlocutor as well as one high-quality shotgun microphone (Sennheiser ME 66) to capture the overall acoustic scene and especially the output of the voice assistant. The recordings were stored uncompressed in WAV-format with 44.1 kHz sample rate and 16 bit resolution.
Content of the Corpus
| Participants | 27 German-speaking students |
| Distribution of Sex | 13 men 14 women |
| Distribution of Age | MW 24 years STD 3.32 years Min: 20, Max: 32 years |
| Total amount of data | 17 hours |
| Mean duration per dialog | 31 minutes |
| Annotation | utterances, type of utterances, transcripts, laughter, discourse particles |
| Collected Questionnaires | evaluation of interaction and speaking style, experiences in interacting with voice assistants, socio-demographic data, AttrakDiff |