Function#

KMeans#

supar.utils.fn.kmeans(x: List[int], k: int, max_it: int = 32) Tuple[List[float], List[List[int]]][source]#

KMeans algorithm for clustering the sentences by length.

Parameters:
  • x (List[int]) – The list of sentence lengths.

  • k (int) – The number of clusters, which is an approximate value. The final number of clusters can be less or equal to k.

  • max_it (int) – Maximum number of iterations. If centroids does not converge after several iterations, the algorithm will be early stopped.

Returns:

The first list contains average lengths of sentences in each cluster. The second is the list of clusters holding indices of data points.

Return type:

List[float], List[List[int]]

Examples

>>> x = torch.randint(10, 20, (10,)).tolist()
>>> x
[15, 10, 17, 11, 18, 13, 17, 19, 18, 14]
>>> centroids, clusters = kmeans(x, 3)
>>> centroids
[10.5, 14.0, 17.799999237060547]
>>> clusters
[[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]

Stripe#

supar.utils.fn.stripe(x: Tensor, n: int, w: int, offset: Tuple = (0, 0), horizontal: bool = True) Tensor[source]#

Returns a parallelogram stripe of the tensor.

Parameters:
  • x (Tensor) – the input tensor with 2 or more dims.

  • n (int) – the length of the stripe.

  • w (int) – the width of the stripe.

  • offset (tuple) – the offset of the first two dims.

  • horizontal (bool) – True if returns a horizontal stripe; False otherwise.

Returns:

A parallelogram stripe of the tensor.

Examples

>>> x = torch.arange(25).view(5, 5)
>>> x
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]])
>>> stripe(x, 2, 3)
tensor([[0, 1, 2],
        [6, 7, 8]])
>>> stripe(x, 2, 3, (1, 1))
tensor([[ 6,  7,  8],
        [12, 13, 14]])
>>> stripe(x, 2, 3, (1, 1), 0)
tensor([[ 6, 11, 16],
        [12, 17, 22]])