error_metrics#

er_evaluation.error_metrics(prediction, sample)[source]#

Compute canonical set of error metrics from record error table.

Error metrics included:

Parameters:
  • prediction (Series) – Membership vector representing a predicted disambiguation.

  • sample (Series) – Membership vector representing a set of true clusters.

Returns:

Dataframe indexed by cluster identifiers and with values corresponding to error metrics.

Return type:

DataFrame

Examples
>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4])
>>> sample = pd.Series(index=[1,2,3,4,5,6,7, 8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3"])
>>> error_metrics(prediction, sample)  
expected_extra  expected_relative_extra expected_missing        expected_relative_missing       error_indicator
reference
c1      0.333333        0.166667        1.333333        0.444444        1
c2      0.500000        0.250000        1.000000        0.500000        1
c3      1.000000        0.333333        0.000000        0.000000        0

Notes

The sample is restricted to the set of records which are present in the prediction.