load_rldata500_disambiguations#

er_evaluation.load_rldata500_disambiguations()[source]#

Load reference and predicted disambiguations for the RLdata500 dataset.

The reference disambiguation is the series of true unique identifiers for RLdata500.

Predicted disambiguations are a set of four toy disambiguations meant to showcase and test features of this package. The four predicted disambiguations are:

  • name: Disambiguation based on exact matching first name and last name.

  • name_by: Disambiguation based on exact matching first name, last name, and birth year.

  • name_bm: Disambiguation based on exact matching first name, last name, and birth month.

  • name_bd: Disambiguation based on exact matching first name, last name, and birth day.

These are returned in a dictionary with the above named elements.

Returns:

tuple of the form (predictions, reference), where reference is the ground truth disambiguation and predictions is a dictionary with four toy disambiguations.

Return type:

tuple

Examples

Load ground truth and the set of four toy predictions:

>>> predictions, reference = load_rldata500_disambiguations()

Compute pairwise precision for each prediction:

>>> from er_evaluation.metrics import pairwise_precision
>>> pairwise_precision(predictions["name"], reference)
0.4523809523809524
>>> pairwise_precision(predictions["name_by"], reference)
1.0
>>> pairwise_precision(predictions["name_bm"], reference)
0.7619047619047619
>>> pairwise_precision(predictions["name_bd"], reference)
1.0