EN HU

BERT may help in lexicographic sense delineation

Tóth, Á., Abdelzaher, E.: BERT may help in lexicographic sense delineation.
Int J Digit Humanities. 2024, 1-22, 2024.
cím:
BERT may help in lexicographic sense delineation
szerzők:
  • Tóth Ágoston
  • Abdelzaher, Esra
kiadás éve:
2024
típus:
folyóiratcikk
műfaj:
idegen nyelvű folyóiratközlemény külföldi lapban
folyóirat:
International Journal of Digital Humanities (ISSN: 2524-7840)
nyelv:
angol
MAB:
bölcsészettudományok, nyelvtudományok
tárgyszavak:
Digital lexicography, BERT, Hierarchical clustering, Sense delineation, Polysemy
absztrakt:
This study addresses the challenge of sense delineation, which is one of the most difficult tasks for lexicographers (Kilgarriff, 1998), who need to abstract senses from corpus citations (Kilgarriff, 2007). There is initial evidence that contextualized embeddings (such as BERT word representations; Devlin et al., 2019) form distinct clusters corresponding to different word senses (Wiedemann et al., 2019; Schmidt & Hofmann, 2020), making BERT successful at the word sense disambiguation task. This study further examines this idea from a lexicographical perspective. The experiment cites dictionary examples and creates contextualized embeddings to represent example sentences using BERT. Clusters are visualized in two dimensions and are quantitatively and qualitatively processed. Results reveal that BERT's distributional representations are not only sensitive to salient syntactic variation, but they also capture the semantic diversity in word senses. The different parts of speech of the same word formed distinctive clusters with moderate to high silhouette scores. Also, literal, metaphoric and metonymic extensions of word senses appeared in different hierarchical clusters. Dissimilar semantic preferences and differences in the cognitive prominence of a target word were also mirrored in forming multiple sub-clusters of the same sense. Qualitative error analysis of the cases with negative silhouette scores showed the influence of fuzzy categorization on the distributional representation of example sentences. It also spotted example sentences which failed to specify the abstractness of the definitions or overspecified the use of a target word in a considerably long sentence and may, accordingly, be of less practical value for the dictionary user.
DEENK Debreceni Egyetem
© 2012 Debreceni Egyetem