Opin vísindi

GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets

GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets


Titill: GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets
Höfundur: Skarphedinsson, Njall
Gudmundsson, Breki
Smari, Steinar
Larusdottir, Marta Kristin
Einarsson, Hafsteinn
Khan, Abuzar
Nyberg, Eric
Loftsson, Hrafn
Útgáfa: 2023-05-01
Tungumál: Enska
Umfang: 9
Deild: Department of Computer Science
Faculty of Industrial Engineering, Mechanical Engineering and Computer Science
ISBN: 9781959429456
Birtist í: EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations; ()
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations; ()
DOI: 10.18653/v1/2023.eacl-demo.18
URI: https://hdl.handle.net/20.500.11815/4314

Skoða fulla færslu

Tilvitnun:

Skarphedinsson , N , Gudmundsson , B , Smari , S , Larusdottir , M K , Einarsson , H , Khan , A , Nyberg , E & Loftsson , H 2023 , GameQA: Gamified Mobile App Platform for Building Multiple-Domain Question-Answering Datasets . in EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations : System Demonstrations . EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations , Association for Computational Linguistics , Dubrovnik, Croatia , pp. 152-160 . https://doi.org/10.18653/v1/2023.eacl-demo.18

Útdráttur:

The methods used to create many of the well-known Question-Answering (QA) datasets are hard to replicate for low-resource languages. A commonality amongst these methods is hiring annotators to source answers from the internet by querying a single answer source, such as Wikipedia. Applying these methods for low-resource languages can be problematic since there is no single large answer source for these languages. Consequently, this can result in a high ratio of unanswered questions, since the amount of information in any single source is limited. To address this problem, we developed a novel crowd-sourcing platform to gather multiple-domain QA data for low-resource languages. Our platform, which consists of a mobile app and a web API, gamifies the data collection process. We successfully released the app for Icelandic (a low-resource language with about 350,000 native speakers) to build a dataset which rivals large QA datasets for high-resource languages both in terms of size and ratio of answered questions. We have made the platform open source with instructions on how to localize and deploy it to gather data for other low-resource languages.

Athugasemdir:

Publisher Copyright: © 2023 Association for Computational Linguistics.

Skrár

Þetta verk birtist í eftirfarandi safni/söfnum: