er_evaluation.datasets#

Example datasets and disambiguations#

The datasets module contains toy datasets used to test and demonstrate the functionality of the ER-Evaluation package.

For example, the load_pv_disambiguations() function returns the tuple (predictions, reference), where predictions is a dictionary containing PatentsView’s disambiguation history (indexed by pandas Datetime objects), and where reference is Binette’s 2022 inventors benchmark that contains 401 disambiguated inventors.

The load_pv_data() function returns a dataframe containing features for a small set of inventor mention.

The load_rldata10000_disambiguations() and load_rldata10000() return ground truth disambiguation, toy predicted disambiguations, and the full RLdata1000 dataframe.

Functions#

load_pv_data()

Load PatentsView dataset.

load_pv_disambiguations()

Load reference disambiguation and predicted disambiguations for the PatentsView dataset.

load_rldata500()

Load RLdata500 dataset.

load_rldata500_disambiguations()

Load reference and predicted disambiguations for the RLdata500 dataset.

load_rldata10000()

Load RLdata10000 dataset.

load_rldata10000_disambiguations()

Load reference and predicted disambiguations for the RLdata10000 dataset.