MinHash (Broder, 1997)
Choose
random hash functions
.
For
,
let
.
Claim: For all
,
Proof:
#incomplete
See Hashing.
References:
- A. Z. Broder, “On the resemblance and containment of documents,” in
Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat.
No.97TB100171), Salerno, Italy: IEEE Comput. Soc, 1998, pp. 21–29.
doi: 10.1109/SEQUEN.1997.666900.
- https://en.wikipedia.org/wiki/MinHash
- https://www.chrismusco.com/amlds2023/notes/lecture05.html
- https://people.cs.umass.edu/~cmusco/CS514F21/slides/lecture9/lecture9Annotated.pdf
- https://ekzhu.com/datasketch/minhash.html
- https://users.cs.utah.edu/~jeffp/teaching/cs5955/L5-Minhash.pdf
- https://aksakalli.github.io/2016/03/01/jaccard-similarity-with-minhash.html