error_indicator#

er_evaluation.error_indicator(prediction, sample)[source]#

Error indicator metric.

Given a predicted disambiguation prediction and a sample of true clusters sample, both represented as membership vectors, this functions returns an indicator whether each true cluster matches a predicted cluster. This is a pandas Series indexed by true cluster identifier and with values corresponding to 0 or 1, depending on whether or not the true cluster matches a predicted cluster.

Parameters:
  • prediction (Series) – Membership vector representing a predicted disambiguation.

  • sample (Series) – Membership vector representing a set of true clusters.

Returns:

Pandas Series indexed by true cluster identifiers (unique values in sample) and with values corresponding to the error indicator.

Return type:

Series

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,5])
>>> sample = pd.Series(index=[1,2,3,4,5,8], data=["c1", "c1", "c1", "c2", "c2", "c4"])
>>> error_indicator(prediction, sample)
reference
c1    1
c2    1
c4    0
Name: error_indicator, dtype: int64

Notes

The sample is restricted to the set of records which are present in the prediction.