load_rldata10000_disambiguations#

er_evaluation.datasets.load_rldata10000_disambiguations()[source]#

Load reference and predicted disambiguations for the RLdata10000 dataset.

The reference disambiguation is the series of true unique identifiers for RLdata10000.

Predicted disambiguations are a set of four toy disambiguations meant to showcase and test features of this package. The four predicted disambiguations are:

  • name: Disambiguation based on exact matching first name and last name.

  • name_by: Disambiguation based on exact matching first name, last name, and birth year.

  • name_bm: Disambiguation based on exact matching first name, last name, and birth month.

  • name_bd: Disambiguation based on exact matching first name, last name, and birth day.

These are returned in a dictionary with the above named elements.

Returns:

tuple of the form (predictions, reference), where reference is the ground truth disambiguation and predictions is a dictionary with four toy disambiguations.

Return type:

tuple

Examples

Load ground truth and the set of four toy predictions:

>>> predictions, reference = load_rldata10000_disambiguations()

Compute pairwise precision for each prediction:

>>> from er_evaluation.metrics import pairwise_precision
>>> pairwise_precision(predictions["name"], reference)
0.04653923780125846
>>> pairwise_precision(predictions["name_by"], reference)
0.7028571428571428
>>> pairwise_precision(predictions["name_bm"], reference)
0.3076086956521739
>>> pairwise_precision(predictions["name_bd"], reference)
0.501937984496124