Similarity Measures
A new movie like Star Wars might appear to be closer to Harry Potter in this embedded space but how close?
To make this precise consider the following embedding values for the movies here.
A similarity measure is just a metric that defines exactly how similar or close two items are in the embedding space.
One commonly used similarity measure is the dot product.
To compute the dot product of two vectors, we compute the sum of the product-wise components of each vector.
Because 26 is greater than 19 Star Wars is more similar to Harry Potter than it is to Memento which confirms our intuition.
The Cosine Similarity is another popularly used similarity measure.
It’s similar to the dot product but scaled by the norm of the movie feature vectors.
Note that the similarity is still greatest between Harry Potter and Star Wars.
Let’s have a quick quiz to check our understanding of similarity measures.
Compute the cosine similarity between Star Wars and Shrek, and between Harry Potter and The Incredibles.
Which pair of movies is more similar?
Star wars and Shrek are more similar because the Cosine similarity measure between them is greater.