cluster_f#

er_evaluation.metrics.cluster_f(prediction, reference, beta=1.0)[source]#

Cluster F score for the inner join of two clusterings.

Cluster F score:

Cluster F score is defined as the weighted harmonic mean of cluster precision and cluster recall:

\[F_\beta = \frac{(1 + \beta^2)PR}{ \beta^2 P+R}\]

The \(\beta\) parameter controls the relative weight of precision and recall. When \(\beta = 1\), the F score is the harmonic mean of precision and recall. When \(\beta < 1\), the F score is weighted towards precision. When \(\beta > 1\), the F score is weighted towards recall.

Parameters:
  • prediction (Series) – Membership vector for the predicted clustering.

  • reference (Series) – Membership vector for the reference clustering.

  • beta (float) – Weight of precision in the F score.

Returns:

Cluster F score for the inner join of prediction and reference.

Return type:

float

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,5])
>>> reference = pd.Series(index=[1,2,3,4,5,6,7,8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c4"])
>>> cluster_f(prediction, reference)
0.4444444444444445

Notes

NA values are dropped from membership vectors prior to computing the metric.