Models CBOW (Continuous Bag of Words) Use the context to predict the probability of current word. Context words’ vectors are $$\upsilon_{c-n} … \upsilon_{c+m}$$ ($$m$$ is the window size) Context vector $$\hat{\upsilon}=\frac{\upsilon_{c-m}+\upsilon_{c-m+1}+…+\upsilon_{c+m}}{2m}$$ Score vector $$z_i = u_i\hat{\upsilon}$$, where $$u_i$$ is the output vector representation of word $$\omega_i$$ Turn scores into probabilities $$\hat{y}=softmax(z)$$ We desire probabilities $$\hat{y}$$ match the true probabilities $$y$$. We use cross entropy $$H(\hat{y},y)$$ to measure the distance between these two distributions.