Gender stereotypes in Estonian word embeddings

Word embeddings are an influential machine learning framework in natural language processing that represents each word in a large text body with a vector. Geometric relationships between the vector representations capture meaningful semantic relationships between the corresponding words. The research paper “Gender stereotypes in Estonian word embeddings” emulated experiments of earlier English literature on the subject, but in the context of Estonian language, positively showing, in particular, that word2vec word embeddings derived from the largest dataset of Estonian texts, the etTenTen corpus, strongly associate male first names with terms related to career and science, while correlating female names with arts and family.

Category: SOCIAL SCIENCES Country: ESTONIA Year: 2021

 

Severin Bratus