Revision

  1. Introduction:

  2. Process:

  3. Mathematical Model:

    $$ P(o|c) = \frac{exp(u_o^T v_c)}{\sum_{w \in V} exp(u_w^T v_c)} $$

    Here, $(P(o|c))$ is the probability of a word $(o)$ given the context $(c)$, and $(u)$ and $(v)$ are the ‘output’ and ‘input’ vectors of the words. The model enhances prediction accuracy over iterations.

  4. Learning Outcome:


Word2Vec Parameters and Computations

  1. Parameters:

  2. Computations:

Untitled

Untitled

  1. Model Characteristics:

Let’s assume we have a vocabulary of 5 words: {Word1, Word2, Word3, Word4, Word5}. The dimension of the word vectors is 3.

The ‘center’ word vectors (V) could look like this:

    | v1_1  v1_2  v1_3 |   # Word1
    | v2_1  v2_2  v2_3 |   # Word2
V = | v3_1  v3_2  v3_3 |   # Word3
    | v4_1  v4_2  v4_3 |   # Word4
    | v5_1  v5_2  v5_3 |   # Word5

And the ‘outside’ word vectors (U) could look like this:

    | u1_1  u1_2  u1_3 |   # Word1
    | u2_1  u2_2  u2_3 |   # Word2
U = | u3_1  u3_2  u3_3 |   # Word3
    | u4_1  u4_2  u4_3 |   # Word4
    | u5_1  u5_2  u5_3 |   # Word5

In these matrices, each row corresponds to a word in the vocabulary, and each column corresponds to a dimension of the word vector. The values v#_# and u#_# represent the coordinates of the word vectors in the vector space.

Optimization: Gradient Descent

  1. Introduction:

  2. Process:

Untitled