er_evaluation.data_structures#

Clustering Data Structures#

The data_structures module contains utilities defining clustering data structures (graph, membership vector, clusters dictionary, and pairwise links list) and allowing transformation between them.

Definitions#

A clustering of a set of elements \(E\) is a partition of \(E\) into a set of disjoint clusters \(C\). For example, the following diagram represents a clustering of elements \(\{0,1,2,3,4,5\}\) into the three clusters β€œc1”, β€œc2”, and β€œc3”:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”
β”‚ 0   1 β”‚  β”‚  3  β”‚  β”‚   β”‚
β”‚       β”‚  β”‚     β”‚  β”‚ 5 β”‚
β”‚   2   β”‚  β”‚  4  β”‚  β”‚   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”˜
   c1        c2      c3

We use the following data structures to represent clusterings:

Membership vector

A membership vector is a pandas Series indexed by the elements of \(E\) and with values corresponding to cluster identifiers. That is, the membership vector maps elements to clusters. Example:

>>> import pandas as pd
>>> pd.Series(["c1", "c1", "c1", "c2", "c2", "c3"], index=[0,1,2,3,4,5])
0    c1
1    c1
2    c1
3    c2
4    c2
5    c3
dtype: object

Note that using integer indices and values for membership vectors will lead to significantly faster computation. See er_evaluation.data_structures.compress_memberships().

Clusters dictionary

A clusters dictionary is a Python dict with keys corresponding to cluster identifiers and values being list of cluster elements. Example:

{'c1': array([0, 1, 2]), 'c2': array([3, 4]), 'c3': array([5])}
Pairwise links list

A pairwise links list is an array of pairwise links between elements of the clustering, where each element of a cluster is linked to every other element of the same cluster. Note that clusters are unnamed in pairwise links lists. Example:

array([[0, 1],
       [0, 2],
       [1, 2],
       [3, 4]])
Graph

A graph is an igraph Graph object with vertices representing clustering elements and with edges between all elements belonging to the same cluster. Note that clusters are unnamed in graphs. Example:

1───2       4
β”‚   β”‚       β”‚       6
└─3β”€β”˜       5

Functions#

compress_memberships(*memberships)

Compress membership vectors to int values, preserving index compatibility.

clusters_to_graph(clusters)

Transform clusters dictionary into Graph.

clusters_to_membership(clusters)

Transform clusters dictionary into membership vector.

clusters_to_pairs(clusters)

Transform clusters dictionary into pairs list.

graph_to_clusters(graph)

Transform Graph into clusters dictionary.

graph_to_membership(graph)

Transform Graph into membership vector.

graph_to_pairs(graph)

Transform Graph into pairs list.

isclusters(obj)

Check if given object is a clusters dictionary.

isgraph(obj)

Check if given object is an iGraph Graph.

ismembership(obj)

Check if given object is a membership vector.

ispairs(obj)

Check if given object is a pairs list.

membership_to_clusters(membership)

Transform membership vector into clusters dictionary.

membership_to_graph(membership)

Transform membership vector into Graph.

membership_to_pairs(membership)

Transform membership vector into pairs list.

pairs_to_clusters(pairs,Β indices)

Transform pairs list into clusters dictionary.

pairs_to_graph(pairs,Β indices)

Transform pairs list into Graph.

pairs_to_membership(pairs,Β indices)

Transform pairs list into membership vector.

Classes#

MembershipVector([data,Β dropna])

Series wrapper to validate membership vector format and log potential issues.

Class Inheritance Diagram#

Inheritance diagram of er_evaluation.data_structures._data_structures.MembershipVector