This site contains a set of tools and datasets I had developed or contributed to, related to the area of vector space models of computational lingvistics, as well as some tutorials for people new to the field.
Vector space models is a theoretical framework built around the idea of representing linguistic units as vectors (points) in a high-dimensional space. "Linguistic units" can be words, phrases, sentences, or even whole documents. The techniques for mapping linguistic units to numeric vectors, and also the results of this mapping, are also called word embeddings. They can be obtained from co-occurrence counts of linguistic units (“explicit” or “count-based” embeddings), or they can be obtained by training neural networks to perform certain task like predicting the next unit in the input data (“implicit” embeddings). A more in-depth introduction to vector space models can be found here.
Some related publications:
- A. Drozd, A. Gladkova, and S. Matsuoka, “Word embeddings, analogies, and machine learning: beyond king - man + woman = queen,” Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3519–3530, Osaka, Japan, December 11-17 2016 [pdf]
- A. Gladkova, A. Drozd, and S. Matsuoka, “Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.,” in Proceedings of the NAACL-HLT SRW, San Diego, California, June 12-17, 2016, 2016, pp. 47–54. pdf
- A. Gladkova and A. Drozd, “Intrinsic evaluations of word embeddngs: what can we do better?,” in Proceedings of The 1st Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, 2016, pp. 36–42. pdf
- A. Drozd, A. Gladkova, and S. Matsuoka, “Discovering Aspectual Classes of Russian Verbs in Untagged Large Corpora,” in Proceedings of 2015 IEEE International Conference on Data Science and Data Intensive Systems (DSDIS), 2015, pp. 61–68. ref/pdf
- A. Drozd, A. Gladkova, and S. Matsuoka, “Python, Performance, and Natural Language Processing,” in Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, New York, NY, USA, 2015, p. 1:1–1:10. ref/pdf