er_evaluation.datasets#

Example datasets and disambiguations#

The datasets module contains toy datasets used to test and demonstrate the functionality of the ER-Evaluation package.

For example, the load_pv_disambiguations() function returns the tuple (predictions, reference), where predictions is a dictionary containing PatentsView’s disambiguation history (indexed by pandas Datetime objects), and where reference is Binette’s 2022 inventors benchmark that contains 401 disambiguated inventors.

The load_pv_data() function returns a dataframe containing features for a small set of inventor mention.

The load_rldata10000_disambiguations() and load_rldata10000() return ground truth disambiguation, toy predicted disambiguations, and the full RLdata1000 dataframe.

Functions#

`load_pv_data`()	Load PatentsView dataset.
`load_pv_disambiguations`()	Load reference disambiguation and predicted disambiguations for the PatentsView dataset.
`load_rldata500`()	Load RLdata500 dataset.
`load_rldata500_disambiguations`()	Load reference and predicted disambiguations for the RLdata500 dataset.
`load_rldata10000`()	Load RLdata10000 dataset.
`load_rldata10000_disambiguations`()	Load reference and predicted disambiguations for the RLdata10000 dataset.