b_cubed_precision#

er_evaluation.metrics.b_cubed_precision(prediction, reference)[source]#

B-cubed precision for the inner join of two clusterings, with equal weight placed on each ground truth cluster.

Mathematically, this is defined as

\[P_{B^3} = \frac{1}{\lvert \mathcal{C}\rvert}\sum_{c \in \mathcal{C}} \frac{1}{\lvert c \rvert} \sum_{r \in c} \frac{\lvert c(r) \cap \hat c(r)\rvert }{\lvert \hat c(r) \rvert}\]

where

\(\mathcal{C}\) is the set of ground truth clusters,
\(c\) is a ground truth cluster,
\(r\) is a mention in \(c\),
\(c(r)\) is the cluster associated with \(r\) in the ground truth clustering,
\(\hat c(r)\) is the cluster associated with \(r\) in the predicted clustering.

Parameters:

prediction (Series) – Membership vector for the predicted clustering.
reference (Series) – Membership vector for the reference clustering.

Returns:

B-cubed precision for the inner join of prediction and reference.

Return type:

float

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4])
>>> reference = pd.Series(index=[1,2,3,4,5,6,7,8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c4"])
>>> b_cubed_precision(prediction, reference)
0.6458333333333334

Notes

NA values are dropped from membership vectors prior to computing the metric.