Az ember, a korpusz és a számítógép: Magyar nyelvű szóhasonlósági mérések humán és disztribúciós szemantikai kísérletben

Tóth, Á.: Az ember, a korpusz és a számítógép: Magyar nyelvű szóhasonlósági mérések humán és disztribúciós szemantikai kísérletben.
Argumentum (Debr.). 9, 301-310, 2013.

DEA

title:

authors:

Tóth Ágoston

published:

2013

type:

article

genre:

Hungarian journal publication in a domestic (Hungarian) journal

journal:

Argumentum (ISSN: 1787-3606)

language:

Hungarian

HAC:

Humanities, Linguistics

subjects:

word similarity, distributional semantics, vector spaces, computational linguistics

abstract:

The paper reports on the results of two word similarity experiments. The first experiment is a subjective human test: similarity values for 31 pairs of Hungarian words have been collected from 28 subjects. The test method comes from Rubenstein & Goodenough (1965) and it reflects the intuition that word similarity is a continuum from clear cases of synonymy to the complete lack of apparent similarity. The Hungarian results correlate very well with the data collected by Rubenstein and Goodenough (Spearman r=0,959, p<0,01) and also with the English replica experiments (Miller & Charles 1991 and Resnik 1995). In the second experiment presented here, a computer program collected similarity data for the same words, based on the context in which they typically occur. The correlation between the subjective and the corpus-based data series is r=0,591 (p<0,01).

projects:

K 72983; TÁMOP-4.2.4.A/2-11-1-2012-0001