homonymy_rate_estimator#

er_evaluation.homonymy_rate_estimator(sample, weights, prediction=None, names=None)[source]#

Compute the homonymy rate estimator for the given sample, weights, prediction, and names.

Homonymy rate:

The homonymy rate is the proportion of clusters which share a name with another cluster.

Parameters:
  • sample (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier.

  • weights (Series) – Pandas Series indexed by cluster identifier and with values corresponding to cluster sampling weights (e.g., inverse sampling probabilities). Can also be the string “uniform” for uniform sampling weights, or “cluster_size” for inverse cluster size sampling weights.

  • prediction (pd.Series, optional) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier. Defaults to None.

  • names (pd.Series, optional) – Series containing names associated with each cluster element. Used for Name Variation and Homonymy Rate Estimates. Defaults to None.

Returns:

Homonymy rate estimate and standard deviation estimate.

Return type:

tuple