Function#
KMeans#
- supar.utils.fn.kmeans(x: List[int], k: int, max_it: int = 32) Tuple[List[float], List[List[int]]] [source]#
KMeans algorithm for clustering the sentences by length.
- Parameters:
x (List[int]) – The list of sentence lengths.
k (int) – The number of clusters, which is an approximate value. The final number of clusters can be less or equal to k.
max_it (int) – Maximum number of iterations. If centroids does not converge after several iterations, the algorithm will be early stopped.
- Returns:
The first list contains average lengths of sentences in each cluster. The second is the list of clusters holding indices of data points.
- Return type:
Examples
>>> x = torch.randint(10, 20, (10,)).tolist() >>> x [15, 10, 17, 11, 18, 13, 17, 19, 18, 14] >>> centroids, clusters = kmeans(x, 3) >>> centroids [10.5, 14.0, 17.799999237060547] >>> clusters [[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]
Stripe#
- supar.utils.fn.stripe(x: Tensor, n: int, w: int, offset: Tuple = (0, 0), horizontal: bool = True) Tensor [source]#
Returns a parallelogram stripe of the tensor.
- Parameters:
- Returns:
A parallelogram stripe of the tensor.
Examples
>>> x = torch.arange(25).view(5, 5) >>> x tensor([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) >>> stripe(x, 2, 3) tensor([[0, 1, 2], [6, 7, 8]]) >>> stripe(x, 2, 3, (1, 1)) tensor([[ 6, 7, 8], [12, 13, 14]]) >>> stripe(x, 2, 3, (1, 1), 0) tensor([[ 6, 11, 16], [12, 17, 22]])