er_evaluation.data_structures#
Clustering Data Structures#
The data_structures module contains utilities defining clustering data structures (graph, membership vector, clusters dictionary, and pairwise links list) and allowing transformation between them.
Definitions#
A clustering of a set of elements \(E\) is a partition of \(E\) into a set of disjoint clusters \(C\). For example, the following diagram represents a clustering of elements \(\{0,1,2,3,4,5\}\) into the three clusters βc1β, βc2β, and βc3β:
βββββββββ βββββββ βββββ
β 0 1 β β 3 β β β
β β β β β 5 β
β 2 β β 4 β β β
βββββββββ βββββββ βββββ
c1 c2 c3
We use the following data structures to represent clusterings:
- Membership vector
A membership vector is a pandas
Seriesindexed by the elements of \(E\) and with values corresponding to cluster identifiers. That is, the membership vector maps elements to clusters. Example:>>> import pandas as pd >>> pd.Series(["c1", "c1", "c1", "c2", "c2", "c3"], index=[0,1,2,3,4,5]) 0 c1 1 c1 2 c1 3 c2 4 c2 5 c3 dtype: object
Note that using integer indices and values for membership vectors will lead to significantly faster computation. See
er_evaluation.data_structures.compress_memberships().- Clusters dictionary
A clusters dictionary is a Python
dictwith keys corresponding to cluster identifiers and values being list of cluster elements. Example:{'c1': array([0, 1, 2]), 'c2': array([3, 4]), 'c3': array([5])}
- Pairwise links list
A pairwise links list is an array of pairwise links between elements of the clustering, where each element of a cluster is linked to every other element of the same cluster. Note that clusters are unnamed in pairwise links lists. Example:
array([[0, 1], [0, 2], [1, 2], [3, 4]])
- Graph
A graph is an igraph
Graphobject with vertices representing clustering elements and with edges between all elements belonging to the same cluster. Note that clusters are unnamed in graphs. Example:1βββ2 4 β β β 6 ββ3ββ 5
Functions#
|
Compress membership vectors to int values, preserving index compatibility. |
|
Transform clusters dictionary into Graph. |
|
Transform clusters dictionary into membership vector. |
|
Transform clusters dictionary into pairs list. |
|
Transform Graph into clusters dictionary. |
|
Transform Graph into membership vector. |
|
Transform Graph into pairs list. |
|
Check if given object is a clusters dictionary. |
|
Check if given object is an iGraph |
|
Check if given object is a membership vector. |
|
Check if given object is a pairs list. |
|
Transform membership vector into clusters dictionary. |
|
Transform membership vector into Graph. |
|
Transform membership vector into pairs list. |
|
Transform pairs list into clusters dictionary. |
|
Transform pairs list into Graph. |
|
Transform pairs list into membership vector. |
Classes#
|
Series wrapper to validate membership vector format and log potential issues. |