Brief intro to Vector Space Models

Vector Space Models map linguistic items to vectors (points) in high-dimensional space. "Items" can be terms, sentences and whole documents and there are various ways of how those vectors can be constructed. Here we focus on Vector Space Models of lexical semantics - vectors correspond to individual words and the values are derived based on the contexts those words appear in. This type of VSMs is related to Distributional Semantics - research field that studies semantic similarities between linguistic items based on their distributional properties in large samples of language data.

There are several approaches to how to build vector representations for words - called "word embeddings". The most straight-forward one is to count all the contexts every word appear in by building so called "co-occurrence matrix", where each row represents a word and each column - possible contexts. The definitions of what a context differs widely as well; in the simplest case, a context is just a window of a given size around the target word; it can be as well be defined though syntactic relations between the words etc. Pointwise mutual information is often used to quantify associations between words and contexts. Finally, dimensionality reduction can be performed on a co-occurrence matrix, typically using Singular Value Decomposition to obtain more practical vectors of lower dimension and smooth the noise in the original data.

There are other approaches to build low-dimensional word embeddings. One such approach, presented by Tomas Mikolov, is to train a neural net to predict words in a given context (or to predict contexts for a given word) and use first layer activations as vectors. Although in popular media this is sometimes labeled as "deep learning" method - the neural net architecture used is shallow. This approach recently attracted much attention as it has been shown to capture such complex semantic relations between the words as "country" : "capital". However, later Levy and Goldberg showed that traditional "explicit" models exhibit much of the same properties and that neural embeddings essentially perform implicit co-occurrence matrix factorization.

Back to the main page.