load_rldata500_disambiguations#

er_evaluation.datasets.load_rldata500_disambiguations()[source]#

Load reference and predicted disambiguations for the RLdata500 dataset.

The reference disambiguation is the series of true unique identifiers for RLdata500.

Predicted disambiguations are a set of four toy disambiguations meant to showcase and test features of this package. The four predicted disambiguations are:

name: Disambiguation based on exact matching first name and last name.
name_by: Disambiguation based on exact matching first name, last name, and birth year.
name_bm: Disambiguation based on exact matching first name, last name, and birth month.
name_bd: Disambiguation based on exact matching first name, last name, and birth day.

These are returned in a dictionary with the above named elements.

Returns:: tuple of the form (predictions, reference), where reference is the ground truth disambiguation and predictions is a dictionary with four toy disambiguations.
Return type:: tuple

Examples

Load ground truth and the set of four toy predictions:

>>> predictions, reference = load_rldata500_disambiguations()

Compute pairwise precision for each prediction:

>>> from er_evaluation.metrics import pairwise_precision
>>> pairwise_precision(predictions["name"], reference)
0.4523809523809524

>>> pairwise_precision(predictions["name_by"], reference)
1.0

>>> pairwise_precision(predictions["name_bm"], reference)
0.7619047619047619

>>> pairwise_precision(predictions["name_bd"], reference)
1.0