cluster_precision#

er_evaluation.metrics.cluster_precision(prediction, reference)[source]#

Cluster precision for the inner join of two clusterings.

Cluster precision:

Consider two clusterings of a set of records, refered to as the predicted and reference clusterings. Let \(C\) be the set of reference (true) clusters, and let \(\hat C\) be the set of predicted clusters. Cluster precision is then defined as

\[P = \frac{\lvert C \cap \hat C \rvert}{\lvert \hat C \rvert}\]

This is the proportion of correctly predicted clusters among all predicted clusters.

Parameters:

prediction (Series) – Membership vector for the predicted clustering.
reference (Series) – Membership vector for the reference clustering.

Returns:

Cluster precision for the inner join of prediction and reference.

Return type:

float

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,5])
>>> reference = pd.Series(index=[1,2,3,4,5,6,7,8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c4"])
>>> cluster_precision(prediction, reference)
0.4

Notes

NA values are dropped from membership vectors prior to computing the metric.