# SAT2Vec: Word2Vec Versus SAT Analogies

*July 11, 2016* | #machine-learning

Word embeddings, like [Word2Vec][Word2Vec_paper] and [GloVe][GloVe_site], have
proved to be a powerful way of representing text for machine learning
algorithms. The idea behind these methods is relatively simple: words that are
close to each other in the training text should be close to each other in the
vector space. Of course, you could achieve this by having all the words in the
exact same spot, but that wouldn't form a useful model, so there is a second
requirement: words that are not close to each other in the text should not be
close to each other in the vector space.

[Word2Vec_paper]: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
[GloVe_site]: https://nlp.stanford.edu/projects/glove/

## Analogies

This simple algorithm produces some neat features, the coolest of which is the
existence of semantic meaning of the directions in the vector space. The
canonical example of this is that the analogy `King : Man :: Queen : Woman`
holds true _mathematically_ in the vector space as follows:

    King − Man = Queen − Woman

Which is more often rewritten as:

    King − Man + Woman = Queen

This shows that one of the directions in our model is "gender"! These
analogies exist for other concepts as well, for example:

    Paris − France + Japan = Tokyo

And ([as shown by][pats_post] my friend and colleague Patrick Callier):

    Workin − Working + Going = Goin

[pats_post]: https://gab41.lab41.org/street-style-guide-vector-transformations-betta-work-2ad8d9829587

## Word2Vec is to the SAT as?

So with all these analogies embedded in the model, I started thinking back to
when analogies were most prevalent in my life: SAT college entrance exam
preparation! How would the model fare if asked to complete a few SAT
analogies?

To find out, I grabbed Google's [pretrained Word2Vec model][w2v_model], which
was trained on Google News, and then scraped 36 practice SAT analogies with
answers from various websites. Once I had the analogies, I calculated the
difference between the vectors for each pair of words. For example, for
`King : Man :: Queen : Woman`, I would calculate the vector sum for `King −
Man` and also for `Queen − Woman`. I then computed the cosine distance between
the vectors from the prompt pair of words and the potential answer pairs and
ranked them from lowest to highest. If the model performed well for an
analogy, the correct pair would be the lowest distance from the prompt pair,
otherwise it would be further away.

[w2v_model]: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing

You can find the Jupyter Notebook used to run the model [here][notebook]
([Rendered on Github][rendered]). You will need the [analogies
data][analogies_json] and [pretrained model][model_json]. The model has been
stripped down to only contain the words that appear in the analogies to save
space.

[notebook]: /files/sat2vec/Word2Vec%20SAT.ipynb
[rendered]: https://github.com/agude/agude.github.io/blob/master/files/sat2vec/Word2Vec%20SAT.ipynb
[analogies_json]: /files/sat2vec/analogies.json
[model_json]: /files/sat2vec/vectors.json

## Results

Here is an example where the model determined the right answer, that is, the correct answer
(in **bold**) is ranked first:

| authenticity : counterfeit |  Distance |
| :------------------------- | --------: |
| **reliability : erratic**  | **0.758** |
| mobility : energetic       |     0.977 |
| argument : contradictory   |     0.997 |
| reserve : reticent         |     1.009 |
| anticipation : solemn      |     1.049 |

Note that `reliability : erratic` was the word pair with the lowest
distance, that is, the model predicted that it was the correct answer. Just as
'counterfeit' implies lack of authenticity, so 'erratic' implies lack of
reliability. The model did in fact succeed in its prediction.

However, the model often failed, as it does for the prompt `paltry :
significance`:

| paltry : significance   |  Distance |
| :---------------------- | --------: |
| austere : landscape     |     0.803 |
| redundant : discussion  |     0.829 |
| **banal : originality** | **0.861** |
| oblique : familiarity   |     0.895 |
| opulent : wealth        |     0.984 |

Here the correct answer is ranked third. Overall, the model ranked the correct
answer first about 20% of the time. The distribution of answers is as follows:

[![Word2Vec Results on SAT Analogies][analogies_plot]][analogies_plot]

[analogies_plot]: /files/sat2vec/analogies_ranking.svg

So our model isn't getting into Berkeley anytime soon; maybe it should try
applying to Stanford instead? (Go Bears!)

The model's answer for all 36 analogies can be found [here][results].

[results]: /blog/sat2vec/results/

## Related Posts
- [LLMs Make Python Scripts Free](/blog/llms-make-python-free/)
- [How I Code With LLMs](/blog/how-i-write-code-with-llms/)
- [How I Write with Large Language Models](/blog/how-i-write-with-llms-revised/)