Ha kapcsolatba szeretne lépni a Tudóstér adminisztrátoraival, kérjük töltse ki az alábbi űrlapot, vagy küldjön e-mailt a publikacioklib.unideb.hu címre.
Bejelentkezés
A Tudóstér funkcióinak nagy része bejelentkezés nélkül is elérhető. Bejelentkezésre az alábbi műveletekhez van szükség:
Probing visualizations of neural word embeddings for lexicographic use
Tóth, Á.,
Abdelzaher, E.:
Probing visualizations of neural word embeddings for lexicographic use.
In: Electronic lexicography in the 21st century: Proceedings of the eLex 2023 conference / edited by Marek Medved, Michal Mechura, Iztok Kosem, Jelena Kallas, Carole Tiberius, Milos Jakubícek, Lexical Computing, Brno, 545-566, 2023, (ISSN 2533-5626)
Probing visualizations of neural word embeddings for lexicographic use
szerzők:
Tóth Ágoston
Abdelzaher, Esra
kiadás éve:
2023
típus:
könyvrészlet
műfaj:
előadáskivonat
nyelv:
angol
MAB:
bölcsészettudományok, nyelvtudományok
tárgyszavak:
sense delineation, word embedding visualization, BERT
absztrakt:
Our study explores the possibility of using the distributional characteristics of headwords as exemplified in the online Oxford Learner's Dictionaries, captured by contextualized word embeddings and displayed in two dimensions to help lexicographers find sense categories, detect variations across senses and select potential example sentences. In addition to the dictionary examples, we added British National Corpus data that contained the headwords. BERT word embeddings were extracted for all occurrences of the headword, then two-dimensional representations of the resulting high-dimensional BERT embedding vectors were created using 4 algorithms: MDS, Isomap, Spectral and t-SNE. Clustering was assisted by k-means clustering and Silhouette scoring for different k values. Our investigation showed that Silhouette scores for k-means increased after dimension reduction; furthermore, spectral and t-SNE visualizations were associated with the most cohesive clusters. The highest Silhouette scores recommended a number of clusters different from the number of dictionary senses, but semantic and syntactic patterns were detectable across the recommended clusters.