Markmið þessarar rannsóknar var að kanna samræmi í orðtíðni í íslenskum og enskum
textum á lesskilnings- og náttúruvísindahluta PISA-prófanna 2018. Ef þýðing texta í
alþjóðlegu prófi er þyngri eða léttari en sami texti á upprunalega málinu getur það haft
áhrif á skilning og skekkt samanburð milli tungumála.
Greindir voru tveir textar úr lesskilningshluta PISA 2018 og tveir úr náttúruvísindahlutanum.
Notaður var orðtíðnilisti Íslenskrar risamálheildar og enskur orðtíðnilisti sem byggist á
tveimur málheildum og er aðgengilegur í gegnum hugbúnaðinn VocabProfile. Orðin voru
flokkuð eftir tíðni í fimm flokka. Ef munur var á tíðniflokki orða á íslensku og ensku
var kannað hvort til væri samheiti fyrir íslenska orðið í sama tíðniflokki og það enska og
lengd samheita borin saman.
Niðurstöður benda til að hlutfall algengustu orða sé lægra í textum íslensku þýðingarinnar
en í ensku frumtextunum og að hlutfall orða í flokki sjaldgæfustu orðanna sé umtalsvert
hærra í íslensku textunum en þeim ensku. Þá virðist dreifing orða á milli orðtíðniflokka
vera jafnari í ensku en íslensku þýðingunni. Fram komu vísbendingar um ákveðið
ósamræmi og ójafnvægi sem fólst í að tveir þriðju hlutar þeirra íslensku orða, sem féllu
í annan tíðniflokk en ensku orðin, voru sjaldgæfari en samsvarandi ensk orð. Í ljós kom
að fækka hefði mátt orðum í ólíkum orðtíðniflokkum með því að nota íslenskt samheiti
í sama orðtíðniflokki og enska orðið og að draga hefði mátt enn frekar úr ósamræminu
með því að velja samheiti úr nærliggjandi tíðniflokki. Hlutfall íslenskra samheita sem
voru algengari og lengri var yfir 30% í textunum fjórum. Niðurstöðurnar gefa tilefni til
að endurskoða þurfi leiðbeiningar OECD og beina því til þýðenda að þeir taki mið af
orðtíðnilistum við val á orðum.
Icelandic learners’ performance in the reading and science literacy parts of PISA has
declined from 2000 to 2015, and the drop in mean scores is one of the most dramatic
among participating countries. The percentage of Icelandic participants in the highest
proficiency levels has fallen, and the percentage in the lowest levels has risen.
PISA tests are written in two parallel source versions, English and French, and then
translated into other languages. OECD publishes guidelines (2016) for translators in
which it is stated that translators should avoid simplifying or complicating the vocabulary
and the syntax. Due to the direct relationship between word understanding and text
comprehension (Laufer & Ravenhorst-Kalovski, 2010), it is of high importance that
translated words be carefully chosen. If there is a higher number of difficult words in one
language than another, the readers may have more difficulty in applying the requested
reading strategy. Such bias may affect the validity of the measurement.
Studies have demonstrated a strong relationship between the extent to which words are
known by individuals and word frequency (Baayen, Wurm & Aycock, 2007; Balota, Yap
& Cortese, 2006; Gardner, Rothkopf, Lapan & Lafferty, 1987; Meunier & Segui, 1999;
Oldfield & Wingfield, 1965).
In the PISA 2018 translation and adaptation guidelines (OECD, 2016, p. 11) it is stated
that „longer words tend to be less frequent, more technical and/or more abstract
than short words“. Nonetheless there is no requirement that translators refer to word
frequency lists as an effort to match the frequency of translated words with the words in
the original version.
The purpose of this research was to compare the alignment of word frequency in
Icelandic translated texts and the original English versions of PISA 2018.
Two text parts were randomly selected from the reading literacy section and two
from the natural science section of PISA 2018. Information about the frequency of
Icelandic words was obtained from a frequency list based on the Icelandic Gigaword
Corpus (Steinþór Steingrímsson, Sigrún Helgadóttir, Eiríkur Rögnvaldsson, Starkaður
Barkarson & Jón Guðnason, 2018; Stofnun Árna Magnússonar í íslenskum fræðum,
2017). The software VocabProfile (Cobb, n.d.) was used for the English words, based on
two corpuses: the New General Service List and the New Academic Word List. The
words were grouped into frequency bands, the most common 1000 words in each band
for the most frequent 4000 words, and less frequent words together in one band. When
translated Icelandic words did not fall into the same frequency band as the corresponding
English words, appropriate Icelandic synonyms were looked for, and the length of the
synonyms was compared.
Results of the study indicate that the share of words in the highest frequency band
was lower in the Icelandic translated texts than in the English versions, and the share of Icelandic words in the lowest frequency band was higher, in all four analysed texts.
Additionally, the English words were more evenly distributed between the five frequency
bands than the Icelandic words. Furthermore, among the words that did not belong to
the same frequency band in Icelandic as in English, the proportion of Icelandic words
of lower frequency was higher. If the Icelandic translators had made use of synonyms
in the same or adjacent frequency band to the English corresponding words, a better
equilibrium between the languages could have been obtained. More than 30% of more
frequent Icelandic words were longer than their less frequent synonyms, which suggests
that for Icelandic words it is not a reliable rule that longer words tend to be less frequent
than shorter words.
Our findings are an indication that the PISA translation guidelines should include a
requirement that translators make use of word frequency lists when choosing words for
their translations, so as to make the comparison between countries more valid. If word
knowledge is more challenging in one country than another, the impact of the different
proficiency factors to be measured is not the same, which may affect the validity of the
study. The findings should contribute to the interpretation of PISA results in reading
and science literacy for 2018, at least when comparing participants who took the tests
in Icelandic and English.