load_rldata500#

er_evaluation.datasets.load_rldata500()[source]#

Load RLdata500 dataset.

Dataset with 500 rows, including 50 noisy duplicate records, from the RecordLinkage R package.

Unique identifiers for each row can be obtained from er_evaluation.datasets.load_rldata500_disambiguations().

Columns are:

  • fname_c1: First name, first component.

  • fname_c2: First name, second component.

  • lname_c1: Last name, first component.

  • lname_c2: Last name, second component.

  • by: Year of birth.

  • bm: Month of birth.

  • bd: Day of birth.

Returns:

RLdata500 dataset.

Return type:

DataFrame