Johnson-Lindenstrauss Lemma (1984)

Lemma

For any set of nn data points πͺ1,...,πͺnβˆˆβ„d\mathbf{q}_1,...,\mathbf{q}_n \in \mathbb{R}^d, there exists a linear map 𝚷:ℝd→ℝk\mathbf{\Pi}: \mathbb{R}^d \rightarrow \mathbb{R}^k where k=O(lognΟ΅2)k=O(\frac{\log{n}}{\epsilon^2}) such that for all i,ji,j, (1βˆ’Ο΅)||πͺiβˆ’πͺj||2≀||𝚷πͺiβˆ’πš·πͺj||2≀(1+Ο΅)||πͺiβˆ’πͺj||2(1-\epsilon)||\mathbf{q}_i-\mathbf{q}_j||_2 \leq ||\mathbf{\Pi q}_i-\mathbf{\Pi q}_j||_2 \leq (1+\epsilon)||\mathbf{q}_i-\mathbf{q}_j||_2

Equivalently, (1βˆ’Ο΅)||πͺiβˆ’πͺj||22≀||𝚷πͺiβˆ’πš·πͺj||22≀(1+Ο΅)||πͺiβˆ’πͺj||22(1-\epsilon)||\mathbf{q}_i-\mathbf{q}_j||_2^2 \leq ||\mathbf{\Pi q}_i-\mathbf{\Pi q}_j||_2^2 \leq (1+\epsilon)||\mathbf{q}_i-\mathbf{q}_j||_2^2 because for small Ο΅\epsilon, (1+Ο΅)2=1+O(Ο΅)(1+\epsilon)^2 = 1+O(\epsilon) and (1βˆ’Ο΅)2=1βˆ’O(Ο΅)(1-\epsilon)^2 = 1-O(\epsilon).

Equivalently, (1βˆ’Ο΅)||𝚷πͺiβˆ’πš·πͺj||22≀||πͺiβˆ’πͺj||22≀(1+Ο΅)||𝚷πͺiβˆ’πš·πͺj||22(1-\epsilon)||\mathbf{\Pi q}_i-\mathbf{\Pi q}_j||_2^2 \leq ||\mathbf{q}_i-\mathbf{q}_j||_2^2 \leq (1+\epsilon)||\mathbf{\Pi q}_i-\mathbf{\Pi q}_j||_2^2 because for small Ο΅\epsilon, 11+Ο΅=1βˆ’O(Ο΅)\frac{1}{1+\epsilon}=1-O(\epsilon) and 11βˆ’Ο΅=1+O(Ο΅)\frac{1}{1-\epsilon}=1+O(\epsilon).

Proof using Distributional Johnson-Lindenstrauss Lemma:

We have a set of vectors πͺ1,...,πͺn\mathbf{q}_1,...,\mathbf{q}_n. Fix i,j∈{1,...,n}i,j \in \{1,...,n\}.

Let 𝐱=πͺiβˆ’πͺj\mathbf{x}=\mathbf{q}_i - \mathbf{q}_j. By linearity, 𝚷𝐱=𝚷(πͺiβˆ’πͺj)=𝚷πͺiβˆ’πš·πͺj\mathbf{Ξ x} = \mathbf{Ξ }(\mathbf{q}_i βˆ’\mathbf{q}_j) = \mathbf{Ξ q}_i βˆ’\mathbf{Ξ q}_j.

By distributional JL Lemma, with probability 1βˆ’Ξ΄1-\delta, (1βˆ’Ο΅)||πͺπ’βˆ’πͺ𝐣||2≀||𝚷πͺiβˆ’πš·πͺj||2≀(1+Ο΅)||πͺπ’βˆ’πͺ𝐣||2(1-\epsilon)||\mathbf{\mathbf{q}_i - \mathbf{q}_j}||_2 \leq ||\mathbf{\Pi q}_i - \mathbf{\Pi q}_j||_2 \leq (1+\epsilon)||\mathbf{\mathbf{q}_i - \mathbf{q}_j}||_2

Finally, set Ξ΄=1n2Ξ΄ = \frac{1}{n^2} . Since there are <n2< n^2 total i,ji,j pairs, by a Union Bound we have that with probability 9/109/10, the above will hold for all i,ji, j, as long as we compress to:

k=O(log(1/(1/n2))Ο΅2)=O(lognΟ΅2)k=O(\frac{\log(1/(1/n^2))}{\epsilon^2})=O(\frac{\log{n}}{\epsilon^2}) dimensions.


This is a type of Euclidean Dimensionality Reduction.

Reference:

  1. W. B. Johnson, J. Lindenstrauss, and G. Schechtman, β€œExtensions of lipschitz maps into Banach spaces,” Israel J. Math., vol. 54, no. 2, pp. 129–138, Jun. 1986, doi: 10.1007/BF02764938.
  2. S. Dasgupta and A. Gupta, β€œAn elementary proof of a theorem of Johnson and Lindenstrauss,” Random Struct. Alg., vol. 22, no. 1, pp. 60–65, Jan. 2003, doi: 10.1002/rsa.10073.
  3. https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma
  4. , see full lecture notes, section 10: Dimension Reduction and the JL Lemma
  5. https://yao-lab.github.io/2020.csic5011/slides/Lecture04_RP.pdf, slide 17
  6. https://simons.berkeley.edu/sites/default/files/docs/721/dubhashislides.pdf
  7. https://www.cs.cmu.edu/afs/cs/academic/class/15456-s14/Handouts/Har-Peled-Chap19.pdf
  8. https://people.eecs.berkeley.edu/~satishr/cs270/sp17/rough-notes/measure-concentration.pdf