summary_statistics#
- er_evaluation.summary.summary_statistics(membership, names=None)[source]#
Compute canonical set of summary statistics.
This includes:
Number of clusters
Average cluster size
Matching rate
Hill numbers of order 0, 1, and 2
If names are provided for each cluster elements then the following two statistics are also provided:
Homonymy rate (proportion of clusters where at least one name is shared with another cluster)
Name variation rate (proportion of clusters with name variation within them)
- Parameters:
membership (Series) – Membership vector representation of a clustering.
names (Series) – Names associated with each cluster elements. Defaults to None.
- Returns:
Dictionary of summary statistics.
- Return type:
dict
Examples
>>> membership = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4]) >>> summary_statistics(membership) {'number_of_clusters': 4, 'average_cluster_size': 2.0, 'matching_rate': 0.875, 'H0': 3, 'H1': 2.82842712474619, 'H2': 2.6666666666666665}