Professor William Cherry invites you to attend the Masters thesis defense of Nancy Le next Thursday, March 31, at 1:00pm in GAB 461. Cookies and coffee will be served in GAB 472 following this event.

"An Exploration of the Word2Vec Algorithm: Creating a Vector Representation of a Language Vocabulary That Encodes Meaning and Usage Patterns in the Vector Space Structure"

Abstract:

I will discuss an algorithm called Word2Vec, developed by Mikolov et. al., which uses a highly efficient shallow neural network to create vector representations of a language vocabulary by training the network on a huge volume of text written in the language. After running the algorithm, one ends up with vector representations for the language vocabulary such that: vectors for similar words point in similar directions, differences among the normalized vectors relate to differences in meaning and usage among the words in the language, and simple linear calculations such as $$\mathrm{king} -\mathrm{ man} +\mathrm{ woman} = \mathrm{queen}$$ tend to make sense. In addition to explaining the structure of the overall Word2Vec algorithm, I will explore various related mathematical concepts such as: the mathematical theory of convex optimization, applications of trees to computer science, and time permitting, the notion of entropy in information theory. I will also discuss how the vector representations can be used in natural language processing, for example to assist with automated language translation or to help computers complete analogies.