All Functions#

Functions#

compress_memberships(*memberships)

Compress membership vectors to int values, preserving index compatibility.

clusters_to_graph(clusters)

Transform clusters dictionary into Graph.

clusters_to_membership(clusters)

Transform clusters dictionary into membership vector.

clusters_to_pairs(clusters)

Transform clusters dictionary into pairs list.

graph_to_clusters(graph)

Transform Graph into clusters dictionary.

graph_to_membership(graph)

Transform Graph into membership vector.

graph_to_pairs(graph)

Transform Graph into pairs list.

isclusters(obj)

Check if given object is a clusters dictionary.

isgraph(obj)

Check if given object is an iGraph Graph.

ismembership(obj)

Check if given object is a membership vector.

ispairs(obj)

Check if given object is a pairs list.

membership_to_clusters(membership)

Transform membership vector into clusters dictionary.

membership_to_graph(membership)

Transform membership vector into Graph.

membership_to_pairs(membership)

Transform membership vector into pairs list.

pairs_to_clusters(pairs, indices)

Transform pairs list into clusters dictionary.

pairs_to_graph(pairs, indices)

Transform pairs list into Graph.

pairs_to_membership(pairs, indices)

Transform pairs list into membership vector.

load_pv_data()

Load PatentsView dataset.

load_pv_disambiguations()

Load reference disambiguation and predicted disambiguations for the PatentsView dataset.

load_rldata500()

Load RLdata500 dataset.

load_rldata500_disambiguations()

Load reference and predicted disambiguations for the RLdata500 dataset.

load_rldata10000()

Load RLdata10000 dataset.

load_rldata10000_disambiguations()

Load reference and predicted disambiguations for the RLdata10000 dataset.

count_extra(prediction, sample)

Count the number of extraneous elements to sampled clusters.

count_missing(prediction, sample)

Count the number of missin elements to sampled clusters.

error_indicator(prediction, sample)

Error indicator metric.

error_metrics(prediction, sample)

Compute canonical set of error metrics from record error table.

expected_extra(prediction, sample)

Expected number of extraneous elements to records in sampled clusters.

expected_missing(prediction, sample)

Expected number of missin elements to records in sampled clusters.

expected_relative_extra(prediction, sample)

Expected relative number of extraneous elements to records in sampled clusters.

expected_relative_missing(prediction, sample)

Expected relative number of missin elements to records in sampled clusters.

expected_size_difference(prediction, sample)

Expected size difference between predicted and sampled clusters.

splitting_entropy(prediction, sample[, alpha])

Splitting entropy of true clusters.

cluster_sizes_from_table(error_table)

Compute cluster sizes from record error table.

error_indicator_from_table(error_table)

Compute error indicator from record error table.

error_metrics_from_table(error_table)

Compute canonical set of error metrics from record error table.

expected_extra_from_table(error_table)

Compute expected extra elements from record error table.

expected_missing_from_table(error_table)

Compute expected missin elements from record error table.

expected_relative_extra_from_table(error_table)

Compute expected relative extra elements from record error table.

expected_relative_missing_from_table(error_table)

Compute expected relative missin elements from record error table.

expected_size_difference_from_table(error_table)

Compute expected size difference from record error table.

fit_dt_regressor(X, y[, numerical_features, ...])

Fits a decision tree regressor model with optional preprocessing for numerical and categorical features.

pred_cluster_sizes_from_table(error_table)

Compute predicted cluster sizes from record error table.

record_error_table(prediction, sample)

Compute record error table.

b_cubed_precision_estimator(prediction, ...)

B-cubed precision design estimator.

b_cubed_recall_estimator(prediction, sample, ...)

B-cubed recall design estimator.

cluster_f_estimator(prediction, sample, weights)

Cluster F-score design estimator.

cluster_precision_estimator(prediction, ...)

Cluster precision design estimator.

cluster_recall_estimator(prediction, sample, ...)

Cluster recall design estimator.

estimates_table(predictions, samples_weights)

Create table of estimates applied to all combinations of predictions and (sample, weights) pairs.

pairwise_f_estimator(prediction, sample, weights)

Design estimator for pairwise F-score.

pairwise_precision_estimator(prediction, ...)

Design estimator for pairwise precision.

pairwise_recall_estimator(prediction, ...)

Design estimator for pairwise recall.

avg_cluster_size_estimator(sample, weights)

Compute the average cluster size estimator for the given sample, weights, prediction, and names.

homonymy_rate_estimator(sample, weights[, ...])

Compute the homonymy rate estimator for the given sample, weights, prediction, and names.

matching_rate_estimator(sample, weights[, ...])

Compute the matching rate estimator for the given sample, weights, prediction, and names.

name_variation_estimator(sample, weights[, ...])

Compute the name variation estimator for the given sample, weights, prediction, and names.

summary_estimates_table(sample, weights, ...)

Generate a summary estimates table for the given sample, weights, predictions, and names.

adjusted_rand_score(prediction, reference)

Compute the adjusted Rand index.

b_cubed_f(prediction, reference[, beta])

B-cubed F score for the inner join of two clusterings.

b_cubed_precision(prediction, reference)

B-cubed precision for the inner join of two clusterings, with equal weight placed on each ground truth cluster.

b_cubed_recall(prediction, reference)

B-cubed recall for the inner join of two clusterings, with equal weight placed on each ground truth cluster.

cluster_completeness(prediction, reference)

Cluster completeness score (based on conditional entropy)

cluster_f(prediction, reference[, beta])

Cluster F score for the inner join of two clusterings.

cluster_homogeneity(prediction, reference)

Cluster homogeneity score (based on conditional entropy).

cluster_precision(prediction, reference)

Cluster precision for the inner join of two clusterings.

cluster_recall(prediction, reference)

Cluster recall for the inner join of two clusterings.

cluster_v_measure(prediction, reference[, beta])

Compute the V-measure.

metrics_table(predictions, references[, metrics])

Apply a set of metrics to all combinations of prediction and reference membership vectors.

pairwise_f(prediction, reference[, beta])

Pairwise F score for the inner join of two clusterings.

pairwise_precision(prediction, reference)

Pairwise precision for the inner join of two clusterings.

pairwise_recall(prediction, reference)

Pairwise recall for the inner join of two clusterings.

rand_score(prediction, reference)

Compute the Rand index.

add_ests_to_summaries(fig, predictions, ...)

compare_plots(*figs[, names, marker, ...])

Combine multiple figures into one.

plot_cluster_errors(prediction, reference[, ...])

Scatter plot of two cluster-wise error metrics.

plot_cluster_sizes_distribution(membership)

Plot the cluster size distribution

plot_comparison(predictions[, metrics, ...])

Plot metrics computed for all prediction pairs.

make_dt_regressor_plot(error_metrics, ...[, ...])

Fit a decision tree regressor to the data and create an interactive sunburst chart visualization of the resulting tree.

plot_dt_regressor_sunburst(dt_regressor, X, ...)

Creates a sunburst plot of a decision tree regressor.

plot_dt_regressor_tree(dt_regressor, ...)

Creates a tree plot of a decision tree regressor.

plot_dt_regressor_treemap(dt_regressor, X, ...)

Creates a treemap plot of a decision tree regressor.

plot_entropy_curve(membership[, q_range, ...])

Plot the Hill number entropy curve

plot_estimates(predictions, sample_weights)

Plot representative performance estimates.

plot_metrics(predictions, reference[, ...])

Plot performance metrics.

plot_performance_disparities(prediction, ...)

Plot largest performance disparities among predefined subgroups.

plot_summaries(predictions[, names, type, ...])

Plot summary statistics

expand_grid(**kwargs)

Create DataFrame from all combination of elements.

load_module_parquet(module, filename)

Load parquet file from a submodule using pyarrow engine.

load_module_tsv(module, filename[, dtype])

Load tsv file from a submodule.

relevant_prediction_subset(prediction, sample)

Return predicted clusters which intersect sampled clusters.

sample_clusters(membership[, weights, ...])

Sample clusters from a membership vector.

average_cluster_size(membership)

Compute the average cluster size.

cluster_hill_number(membership[, alpha])

Compute Hill number of a given order.

cluster_sizes(membership)

Compute the size of each cluster.

cluster_sizes_distribution(membership)

Compute the cluster size distribution

homonymy_rate(membership, names)

Compute the homonymy rate of a given clustering with a set of associated names.

matching_rate(membership)

Compute the matching rate for a given clustering.

name_variation_rate(membership, names)

Compute the name variation rate of a given clustering with a set of associated names.

number_of_clusters(membership)

Number of clusters in a given clustering.

number_of_links(membership)

Number of pairwise links associated with a given clustering.

summary_statistics(membership[, names])

Compute canonical set of summary statistics.

Classes#

MembershipVector([data, dropna])

Series wrapper to validate membership vector format and log potential issues.

Class Inheritance Diagram#

Inheritance diagram of er_evaluation.data_structures._data_structures.MembershipVector