pairwise_precision_estimator#
- er_evaluation.pairwise_precision_estimator(prediction, sample, weights)[source]#
Design estimator for pairwise precision.
Given a predicted disambiguation prediction, a set of ground truth clusters sample, and a set of cluster sampling weights weights (e.g., inverse probability weights for each cluster), this returns a pairwise precision estimate together with its estimated standard deviation.
Note
This is the precision estimator corresponding to cluster block sampling in [1].
- Parameters:
prediction (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier.
sample (Series) – Membership vector indexed by cluster elements and with values corresponding to associated cluster identifier.
weights (Series) – Pandas Series indexed by cluster identifier and with values corresponding to cluster sampling weights (e.g., inverse sampling probabilities). Can also be the string “uniform” for uniform sampling weights, or “cluster_size” for inverse cluster size sampling weights.
- Returns:
Precision estimate and standard deviation estimate.
- Return type:
tuple
Examples
>>> prediction = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4]) >>> sample = pd.Series(index=[1,2,3,4,5,8], data=["c1", "c1", "c1", "c2", "c2", "c4"]) >>> weights = pd.Series(1, index=sample.unique()) # Uniform cluster weights >>> pairwise_precision_estimator(prediction, sample, weights) (0.3888888888888889, 0.2545875386086578)
References
[1] Binette, Olivier, Sokhna A York, Emma Hickerson, Youngsoo Baek, Sarvo Madhavan, Christina Jones. (2022). Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org. arXiv e-prints: arxiv:2210.01230