cluster_precision_estimator#

er_evaluation.estimators.cluster_precision_estimator(prediction, sample, weights)[source]#

Cluster precision design estimator.

Given a predicted disambiguation prediction, a set of ground truth clusters sample, and a set of cluster sampling weights weights (e.g., inverse probability weights for each cluster), this returns a cluster precision estimate together with its estimated standard deviation.

Parameters:
  • prediction (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier. This should cover the entire target population for which cluster precision is being computed.

  • sample (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier.

  • weights (Series) – Pandas Series indexed by cluster identifier and with values corresponding to cluster sampling weights (e.g., inverse sampling probabilities). Can also be the string “uniform” for uniform sampling weights, or “cluster_size” for inverse cluster size sampling weights.

Returns:

Cluster precision estimate and standard deviation estimate.

Return type:

tuple

Examples

>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4])
>>> sample = pd.Series(index=[1,2,3,4,5,6,7, 8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3"])
>>> cluster_precision_estimator(prediction, sample, weights="uniform")
(0.26171875, 0.23593232610221093)

Notes

  • This estimator requires prediction to cover the entire population of interest from which sampled clusters were obtained. Do not subset prediction in any way.