load_rldata10000#

er_evaluation.datasets.load_rldata10000()[source]#

Load RLdata10000 dataset.

Dataset with 10000 rows, including 1000 noisy duplicate records, from the RecordLinkage R package.

Unique identifiers for each row can be obtained from er_evaluation.datasets.load_rldata500_disambiguations().

Columns are:

  • fname_c1: First name, first component.

  • fname_c2: First name, second component.

  • lname_c1: Last name, first component.

  • lname_c2: Last name, second component.

  • by: Year of birth.

  • bm: Month of birth.

  • bd: Day of birth.

Returns:

RLdata500 dataset.

Return type:

DataFrame