load_rldata10000#
- er_evaluation.datasets.load_rldata10000()[source]#
Load RLdata10000 dataset.
Dataset with 10000 rows, including 1000 noisy duplicate records, from the RecordLinkage R package.
Unique identifiers for each row can be obtained from
er_evaluation.datasets.load_rldata500_disambiguations().Columns are:
fname_c1: First name, first component.
fname_c2: First name, second component.
lname_c1: Last name, first component.
lname_c2: Last name, second component.
by: Year of birth.
bm: Month of birth.
bd: Day of birth.
- Returns:
RLdata500 dataset.
- Return type:
DataFrame