load_rldata500_disambiguations#
- er_evaluation.datasets.load_rldata500_disambiguations()[source]#
Load reference and predicted disambiguations for the RLdata500 dataset.
The reference disambiguation is the series of true unique identifiers for RLdata500.
Predicted disambiguations are a set of four toy disambiguations meant to showcase and test features of this package. The four predicted disambiguations are:
name: Disambiguation based on exact matching first name and last name.
name_by: Disambiguation based on exact matching first name, last name, and birth year.
name_bm: Disambiguation based on exact matching first name, last name, and birth month.
name_bd: Disambiguation based on exact matching first name, last name, and birth day.
These are returned in a dictionary with the above named elements.
- Returns:
tuple of the form
(predictions, reference), wherereferenceis the ground truth disambiguation andpredictionsis a dictionary with four toy disambiguations.- Return type:
tuple
Examples
Load ground truth and the set of four toy predictions:
>>> predictions, reference = load_rldata500_disambiguations()
Compute pairwise precision for each prediction:
>>> from er_evaluation.metrics import pairwise_precision >>> pairwise_precision(predictions["name"], reference) 0.4523809523809524
>>> pairwise_precision(predictions["name_by"], reference) 1.0
>>> pairwise_precision(predictions["name_bm"], reference) 0.7619047619047619
>>> pairwise_precision(predictions["name_bd"], reference) 1.0