b_cubed_recall#

er_evaluation.metrics.b_cubed_recall(prediction, reference)[source]#

B-cubed recall for the inner join of two clusterings, with equal weight placed on each ground truth cluster.

Mathematically, this is defined as

\[R_{B^3} = \frac{1}{\lvert \mathcal{C}\rvert}\sum_{c \in \mathcal{C}} \frac{1}{\lvert c \rvert} \sum_{r \in c} \frac{\lvert c(r) \cap \hat c(r)\rvert }{\lvert c(r) \rvert}\]

where

  • \(\mathcal{C}\) is the set of ground truth clusters,

  • \(c\) is a ground truth cluster,

  • \(r\) is a mention in \(c\),

  • \(c(r)\) is the cluster associated with \(r\) in the ground truth clustering, and

  • \(\hat c(r)\) is the cluster associated with \(r\) in the predicted clustering.

Parameters:
  • prediction (Series) – Membership vector for the predicted clustering.

  • reference (Series) – Membership vector for the reference clustering.

Returns:

B-cubed recall for the inner join of prediction and reference.

Return type:

float

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4])
>>> reference = pd.Series(index=[1,2,3,4,5,6,7,8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c4"])
>>> b_cubed_recall(prediction, reference)
0.7638888888888888

Notes

NA values are dropped from membership vectors prior to computing the metric.