summary_statistics#

er_evaluation.summary_statistics(membership, names=None)[source]#

Compute canonical set of summary statistics.

This includes:

Number of clusters
Average cluster size
Matching rate
Hill numbers of order 0, 1, and 2

If names are provided for each cluster elements then the following two statistics are also provided:

Homonymy rate (proportion of clusters where at least one name is shared with another cluster)
Name variation rate (proportion of clusters with name variation within them)

Parameters:

membership (Series) – Membership vector representation of a clustering.
names (Series) – Names associated with each cluster elements. Defaults to None.

Returns:

Dictionary of summary statistics.

Return type:

dict

Examples

>>> membership = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4])
>>> summary_statistics(membership)
{'number_of_clusters': 4, 'average_cluster_size': 2.0, 'matching_rate': 0.875, 'H0': 3, 'H1': 2.82842712474619, 'H2': 2.6666666666666665}