Opin vísindi

GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets

GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets


Title: GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets
Author: Skarphedinsson, Njall
Gudmundsson, Breki
Smari, Steinar
Larusdottir, Marta Kristin
Einarsson, Hafsteinn
Khan, Abuzar
Nyberg, Eric
Loftsson, Hrafn
Date: 2023-05-01
Language: English
Scope: 9
Department: Department of Computer Science
Faculty of Industrial Engineering, Mechanical Engineering and Computer Science
ISBN: 9781959429456
Series: EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations; ()
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations; ()
DOI: 10.18653/v1/2023.eacl-demo.18
URI: https://hdl.handle.net/20.500.11815/4314

Show full item record

Citation:

Skarphedinsson , N , Gudmundsson , B , Smari , S , Larusdottir , M K , Einarsson , H , Khan , A , Nyberg , E & Loftsson , H 2023 , GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets . in EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations : System Demonstrations . EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations , Association for Computational Linguistics , Dubrovnik, Croatia , pp. 152-160 . https://doi.org/10.18653/v1/2023.eacl-demo.18

Abstract:

The methods used to create many of the well-known Question-Answering (QA) datasets are hard to replicate for low-resource languages. A commonality amongst these methods is hiring annotators to source answers from the internet by querying a single answer source, such as Wikipedia. Applying these methods for low-resource languages can be problematic since there is no single large answer source for these languages. Consequently, this can result in a high ratio of unanswered questions, since the amount of information in any single source is limited. To address this problem, we developed a novel crowd-sourcing platform to gather multiple-domain QA data for low-resource languages. Our platform, which consists of a mobile app and a web API, gamifies the data collection process. We successfully released the app for Icelandic (a low-resource language with about 350,000 native speakers) to build a dataset which rivals large QA datasets for high-resource languages both in terms of size and ratio of answered questions. We have made the platform open source with instructions on how to localize and deploy it to gather data for other low-resource languages.

Description:

Publisher Copyright: © 2023 Association for Computational Linguistics.

Files in this item

This item appears in the following Collection(s)