Function

Function#

KMeans#

supar.utils.fn.kmeans(x: List[int], k: int, max_it: int = 32) → Tuple[List[float], List[List[int]]][source]#

KMeans algorithm for clustering the sentences by length.

Parameters:

x (List[int]) – The list of sentence lengths.
k (int) – The number of clusters, which is an approximate value. The final number of clusters can be less or equal to k.
max_it (int) – Maximum number of iterations. If centroids does not converge after several iterations, the algorithm will be early stopped.

Returns:

The first list contains average lengths of sentences in each cluster. The second is the list of clusters holding indices of data points.

Return type:

List[float], List[List[int]]

Examples

>>> x = torch.randint(10, 20, (10,)).tolist()
>>> x
[15, 10, 17, 11, 18, 13, 17, 19, 18, 14]
>>> centroids, clusters = kmeans(x, 3)
>>> centroids
[10.5, 14.0, 17.799999237060547]
>>> clusters
[[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]

Stripe#

supar.utils.fn.stripe(x: Tensor, n: int, w: int, offset: Tuple = (0, 0), horizontal: bool = True) → Tensor[source]#

Returns a parallelogram stripe of the tensor.