cluster_recall#

er_evaluation.metrics.cluster_recall(prediction, reference)[source]#

Cluster recall for the inner join of two clusterings.

Cluster recall:

Consider two clusterings of a set of records, refered to as the predicted and reference clusterings. Let \(C\) be the set of reference (true) clusters, and let \(\hat C\) be the set of predicted clusters. Cluster recall is then defined as

\[R = \frac{\lvert C \cap \hat C \rvert}{\lvert C \rvert}\]

This is the proportion of correctly predicted clusters among all reference (true) clusters.

Parameters:

prediction (Series) – Membership vector for the predicted clustering.
reference (Series) – Membership vector for the reference clustering.

Returns:

Cluster recall for the inner join of prediction and reference.

Return type:

float

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,5])
>>> reference = pd.Series(index=[1,2,3,4,5,6,7,8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c4"])
>>> cluster_recall(prediction, reference)
0.5

Notes

NA values are dropped from membership vectors prior to computing the metric.