name_variation_rate#

er_evaluation.name_variation_rate(membership, names)[source]#

Compute the name variation rate of a given clustering with a set of associated names.

Name variation rate:

The name variation rate is the proportion of clusters with name variation within.

Parameters:
  • membership (Series) – Membership vector representation of a clustering.

  • names (Series) – Series indexed by cluster elements and with values corresponding to the associated name. Note that the index of names should exactly match the index of membership.

Returns:

Name variation rate.

Return type:

float

Examples

>>> membership = pd.Series(index=[1,2,3,4,5,6,7,8], data=[1,1,2,3,2,4,4,4])
>>> names = pd.Series(index=[1,2,3,4,5,6,7,8], data=["n1", "n2", "n3", "n4", "n3", "n1", "n2", "n8"])
>>> name_variation_rate(membership, names)
0.5