cluster_f_estimator#
- er_evaluation.cluster_f_estimator(prediction, sample, weights, beta=1.0)[source]#
Cluster F-score design estimator.
Given a predicted disambiguation prediction, a set of ground truth clusters sample, and a set of cluster sampling weights weights (e.g., inverse probability weights for each cluster), this returns a cluster F-score estimate together with its estimated standard deviation.
- Parameters:
prediction (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier. This should cover the entire target population for which cluster f-score is being computed.
sample (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier.
weights (Series) – Pandas Series indexed by cluster identifier and with values corresponding to cluster sampling weights (e.g., inverse sampling probabilities). Can also be the string “uniform” for uniform sampling weights, or “cluster_size” for inverse cluster size sampling weights.
beta (float) – F-score weight.
- Returns:
Cluster F-score estimate and standard deviation estimate.
- Return type:
tuple
Examples
>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4]) >>> sample = pd.Series(index=[1,2,3,4,5,6,7, 8], data=["c1", "c1", "c1", "c2", "c2", "c3", "c3", "c3"]) >>> cluster_f_estimator(prediction, sample, weights="uniform") (0.29446064139941686, 0.2760765154789527)
Notes
This estimator requires
predictionto cover the entire population of interest from which sampled clusters were obtained. Do not subsetpredictionin any way.