Word embeddings
data:image/s3,"s3://crabby-images/143c2/143c22be6dbdb10424eaef6ebf88365ddd13bb2a" alt="1/19 Word embeddings"
data:image/s3,"s3://crabby-images/4d4e6/4d4e6598a491d3dc83b026d3e7c4a899408c7381" alt="2/19 Word embeddings"
You’ll learn how word embeddings encode text to numbers that convey meanings.
Let’s start with intuition.
data:image/s3,"s3://crabby-images/9f084/9f084195b75d0160a607585c22d4ddb82086d91f" alt="3/19 Word embeddings"
How would you describe a dog?
You might mention its breed, age, size, color, owner, and friendliness.
You can easily think of at least ten different dimensions.
data:image/s3,"s3://crabby-images/122e6/122e67399dcd54d58f9340cb813eeed4af57a8a6" alt="4/19 Word embeddings"
How would you describe a person?
You can again, easily think of at least 20 different dimensions to describe a person.
data:image/s3,"s3://crabby-images/108d7/108d76925ce8858b47dc32e821c52dfd50ee1546" alt="5/19 Word embeddings"
You might now guess the direction.
How would you describe a word then?
data:image/s3,"s3://crabby-images/6ef2a/6ef2aa3741005c352e33216e5c4ec93c766b593c" alt="6/19 Word embeddings"
You can use dimensions, and in math, a vector space.
data:image/s3,"s3://crabby-images/6fafd/6fafd02b6a9ab3f2dce1772ae7d920d76a9b4246" alt="7/19 Word embeddings"
You can now generate an idea: how about representing a word in a vector space with dimensions to describe its properties?
data:image/s3,"s3://crabby-images/072fe/072fe8b5baa36aba93fe065da09966f568db5412" alt="8/19 Word embeddings"
Not only that, you want the distance between the words to indicate the similarities between them.
data:image/s3,"s3://crabby-images/e6b47/e6b47d2f9198bb013e8776bf8a12374068d01b60" alt="9/19 Word embeddings"
You want this representation to capture the analogy between words.
For example, the distance between king and queen is similar to the distance between man and woman.
data:image/s3,"s3://crabby-images/94461/94461f491d9700d69d09f3a41689b50d2e66b706" alt="10/19 Word embeddings"
Now you have King - Man + Woman = Queen
Isn’t it amazing to play with words in the same way you play with numbers?
data:image/s3,"s3://crabby-images/7d8e6/7d8e6b94f273a3dd7c908fad19b4d201e80dad21" alt="11/19 Word embeddings"
Word embedding is technique to encode text into meaningful vectors.
data:image/s3,"s3://crabby-images/becdf/becdfdc6e7ae0ada82ffc6212338891d002159e3" alt="12/19 Word embeddings"
The technique lets you represent text with low-dimensional, dense vectors.
Each dimension is supposed to capture a feature of a word.
A higher dimensional embedding captures detailed relationships between words.
However, it takes more data and resources to train.
You don’t have sparse vectors anymore.
data:image/s3,"s3://crabby-images/898a4/898a4ca89845d006d0fd0f665e583f0f9a7e27a3" alt="13/19 Word embeddings"
The vectors capture the relationships between words where similar words have a similar encoding.
Word embedding is sometimes called a technique of distributed representation, indicating that the meanings of a word are distributed across dimensions.
data:image/s3,"s3://crabby-images/bdc76/bdc76409810acb7e5db2c217b7f5632735fddab5" alt="14/19 Word embeddings"
Instead of specifying the values for the embedding manually, you train a neural network to learn those numbers.
data:image/s3,"s3://crabby-images/5d992/5d9929a5409be1c353ece89b7e6c7c1352ba690d" alt="15/19 Word embeddings"
How do word embeddings do the magic to convert words such as king, queen, man, and woman to vectors that convey the semantic similarities?
data:image/s3,"s3://crabby-images/cc42e/cc42e4d97f37b7df9338c85fd2791a8c15633c37" alt="16/19 Word embeddings"
Word embedding is an abstract term or a technique that includes a few concrete algorithms or models such as
data:image/s3,"s3://crabby-images/12cb7/12cb75c72405a55dc59471a9663ef0db8abe50e8" alt="17/19 Word embeddings"
word2vec by Google, which is considered a breakthrough for applying neural networks to text representations.
data:image/s3,"s3://crabby-images/11595/11595dbe122b7a0f51d4da0f0917e0455198a90f" alt="18/19 Word embeddings"
GloVe by Stanford,
data:image/s3,"s3://crabby-images/13dd1/13dd187f11216a2b7ed93202d9389c9fbe6da1ad" alt="19/19 Word embeddings"
FastText by Facebook, etc.