More about this evaluation can be found in the related workshop paper.
Intrinsic evaluations such as word similarity or analogy evaluations have become the de-facto proving ground for new word embeddings. However, these evaluations are almost always accompanied by extrinsic evaluations on tasks such as sentiment classification. This reflects the fact that word embeddings are usually intended to be used as components in an evaluation for a downstream task of some sort.
In spite of this disconnect between intrinsic evaluations and the purpose of word embeddings, such evaluations endure as a standard. This is due, in part, to the fact that intrinsic evaluations have three properties which are difficult to achieve in extrinsic evaluations — they are fast, replicable, and unbiased.
We attempt to bring these three properties to a standard extrinsic evaluation. In this evalution, embeddings are used as initializations in simple neural models for six popular downstream tasks. The embeddings are tested in two settings -- first, keeping the embeddings fixed to measure their intrinsic qualities, and second, fine-tuning the embeddings to test their performance as initializations.
This evaluation can be used in two ways.
First, to compare embedding methods, you can download a standard corpus on which to train embeddings, then upload them for evaluation. The recommended corpus is that used in Evaluation methods for unsupervised word embeddings (Schnabel et al. 2015).
Alternatively, to simply evaluate the quality of a set of embeddings trained on an arbitrary corpus, you can upload any set of embeddings in the correct format, indicating this in the upload form.
To ensure that the embedding file is in the correct format, run this validation script.
Please contact email@example.com in case of any questions or difficulties.
Crafted with by BlackTie.co.