This site contains a set of tools and datasets I had developed or contributed to, related to the area of vector space models of computational lingvistics, as well as some tutorials for people new to the field.

Vector space models is a theoretical framework built around the idea of representing linguistic units as vectors (points) in a high-dimensional space. "Linguistic units" can be words, phrases, sentences, or even whole documents. The techniques for mapping linguistic units to numeric vectors, and also the results of this mapping, are also called word embeddings. They can be obtained from co-occurrence counts of linguistic units (“explicit” or “count-based” embeddings), or they can be obtained by training neural networks to perform certain task like predicting the next unit in the input data (“implicit” embeddings). A more in-depth introduction to vector space models can be found here.

Some related publications: