er_evaluation.datasets#
Example datasets and disambiguations#
The datasets module contains toy datasets used to test and demonstrate the functionality of the ER-Evaluation package.
For example, the load_pv_disambiguations() function returns the tuple (predictions, reference), where predictions is a dictionary containing PatentsView’s disambiguation history (indexed by pandas Datetime objects), and where reference is Binette’s 2022 inventors benchmark that contains 401 disambiguated inventors.
The load_pv_data() function returns a dataframe containing features for a small set of inventor mention.
The load_rldata10000_disambiguations() and load_rldata10000() return ground truth disambiguation, toy predicted disambiguations, and the full RLdata1000 dataframe.
Functions#
Load PatentsView dataset. |
|
Load reference disambiguation and predicted disambiguations for the PatentsView dataset. |
|
Load RLdata500 dataset. |
|
Load reference and predicted disambiguations for the RLdata500 dataset. |
|
Load RLdata10000 dataset. |
|
Load reference and predicted disambiguations for the RLdata10000 dataset. |