Cosine similarity

cos(θ(x,y))=x,y||x||2||y||2\cos(\theta(x,y)) = \frac{\langle x,y\rangle}{||x||_2 ||y||_2}

with cosine similarity closer to 1 means more similar, farther from 1 means more different

Notes: can think of as natural ‘inverse’ to Euclidean distance ||𝐱𝐲||22||\mathbf{x}-\mathbf{y}||_2^2

Suppose 𝐱,𝐲\mathbf{x},\mathbf{y} are unit vectors, then, ||𝐱𝐲||22=𝐱𝐲,𝐱𝐲=||𝐱||22+||𝐲||222x,y=22cos(θ(𝐱,𝐲))||\mathbf{x}-\mathbf{y}||_2^2 = \langle \mathbf{x} - \mathbf{y},\mathbf{x} - \mathbf{y} \rangle =||\mathbf{x}||_2^2+||\mathbf{y}||_2^2-2\langle x,y \rangle = 2 - 2 \cos(\theta(\mathbf{x},\mathbf{y}))