load_rldata10000_disambiguations#

er_evaluation.datasets.load_rldata10000_disambiguations()[source]#

Load reference and predicted disambiguations for the RLdata10000 dataset.

The reference disambiguation is the series of true unique identifiers for RLdata10000.

Predicted disambiguations are a set of four toy disambiguations meant to showcase and test features of this package. The four predicted disambiguations are:

name: Disambiguation based on exact matching first name and last name.
name_by: Disambiguation based on exact matching first name, last name, and birth year.
name_bm: Disambiguation based on exact matching first name, last name, and birth month.
name_bd: Disambiguation based on exact matching first name, last name, and birth day.

These are returned in a dictionary with the above named elements.

Returns:: tuple of the form (predictions, reference), where reference is the ground truth disambiguation and predictions is a dictionary with four toy disambiguations.
Return type:: tuple

Examples

Load ground truth and the set of four toy predictions:

>>> predictions, reference = load_rldata10000_disambiguations()

Compute pairwise precision for each prediction:

>>> from er_evaluation.metrics import pairwise_precision
>>> pairwise_precision(predictions["name"], reference)
0.04653923780125846

>>> pairwise_precision(predictions["name_by"], reference)
0.7028571428571428

>>> pairwise_precision(predictions["name_bm"], reference)
0.3076086956521739

>>> pairwise_precision(predictions["name_bd"], reference)
0.501937984496124