The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally given the French name coefficient de communauté by Paul Jaccard), is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
其实总结就是一句话:集合的交集与集合的并集的比例.
java 代码实现如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
publicstaticfloatjaccard(String a, String b) { if (a == null && b == null) { return1f; } // 都为空相似度为 1 if (a == null || b == null) { return0f; } Set<Integer> aChar = a.chars().boxed().collect(Collectors.toSet()); Set<Integer> bChar = b.chars().boxed().collect(Collectors.toSet()); // 交集数量 intintersection= SetUtils.intersection(aChar, bChar).size(); if (intersection == 0) return0; // 并集数量 intunion= SetUtils.union(aChar, bChar).size(); return ((float) intersection) / (float)union; }
The Sørensen–Dice coefficient (see below for other names) is a statistic used to gauge the similarity of two samples. It was independently developed by the botanists Thorvald Sørensen[1] and Lee Raymond Dice,[2] who published in 1948 and 1945 respectively.
需要注意的是,他是:集合交集的 2 倍除以两个集合相加。并不是并集.
java 代码实现如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
publicstaticfloatSorensenDice(String a, String b) { if (a == null && b == null) { return1f; } if (a == null || b == null) { return0F; } Set<Integer> aChars = a.chars().boxed().collect(Collectors.toSet()); Set<Integer> bChars = b.chars().boxed().collect(Collectors.toSet()); // 求交集数量 intintersect= SetUtils.intersection(aChars, bChars).size(); if (intersect == 0) { return0F; } // 全集,两个集合直接加起来 intaSize= aChars.size(); intbSize= bChars.size(); return (2 * (float) intersect) / ((float) (aSize + bSize)); }
publicstaticfloatLevenshtein(String a, String b) { if (a == null && b == null) { return1f; } if (a == null || b == null) { return0F; } inteditDistance= editDis(a, b); return1 - ((float) editDistance / Math.max(a.length(), b.length())); }
privatestaticinteditDis(String a, String b) {
intaLen= a.length(); intbLen= b.length();
if (aLen == 0) return aLen; if (bLen == 0) return bLen;
publicstaticfloatcos(String a, String b) { if (a == null || b == null) { return0F; } Set<Integer> aChar = a.chars().boxed().collect(Collectors.toSet()); Set<Integer> bChar = b.chars().boxed().collect(Collectors.toSet());