Az ember, a korpusz és a számítógép: Magyar nyelvű szóhasonlósági mérések humán és disztribúciós szemantikai kísérletben

Tóth, Á.: Az ember, a korpusz és a számítógép: Magyar nyelvű szóhasonlósági mérések humán és disztribúciós szemantikai kísérletben.
Argumentum (Debr.). 9, 301-310, 2013.

DEA

cím:

szerzők:

Tóth Ágoston

kiadás éve:

2013

típus:

folyóiratcikk

műfaj:

magyar nyelvű folyóiratközlemény hazai lapban

folyóirat:

Argumentum (ISSN: 1787-3606)

nyelv:

magyar

MAB:

bölcsészettudományok, nyelvtudományok

tárgyszavak:

word similarity, distributional semantics, vector spaces, computational linguistics

absztrakt:

The paper reports on the results of two word similarity experiments. The first experiment is a subjective human test: similarity values for 31 pairs of Hungarian words have been collected from 28 subjects. The test method comes from Rubenstein & Goodenough (1965) and it reflects the intuition that word similarity is a continuum from clear cases of synonymy to the complete lack of apparent similarity. The Hungarian results correlate very well with the data collected by Rubenstein and Goodenough (Spearman r=0,959, p<0,01) and also with the English replica experiments (Miller & Charles 1991 and Resnik 1995). In the second experiment presented here, a computer program collected similarity data for the same words, based on the context in which they typically occur. The correlation between the subjective and the corpus-based data series is r=0,591 (p<0,01).

pályázatok:

K 72983; TÁMOP-4.2.4.A/2-11-1-2012-0001