Function

Function#

Tarjan#

supar.structs.fn.tarjan(sequence: Iterable[int]) → Iterable[int][source]#

Tarjan algorithm for finding Strongly Connected Components (SCCs) of a graph.

Parameters:: sequence (list) – List of head indices.
Yields:: A list of indices making up a SCC. All self-loops are ignored.

Examples

>>> next(tarjan([2, 5, 0, 3, 1]))  # (1 -> 5 -> 2 -> 1) is a cycle
[2, 5, 1]

ChuLiuEdmods#

supar.structs.fn.chuliu_edmonds(s: Tensor) → Tensor[source]#

ChuLiu/Edmonds algorithm for non-projective decoding McDonald et al. (2005).

Some code is borrowed from tdozat’s implementation. Descriptions of notations and formulas can be found in McDonald et al. (2005).

Notes

The algorithm does not guarantee to parse a single-root tree.

Parameters:: s (Tensor) – [seq_len, seq_len]. Scores of all dependent-head pairs.
Returns:: A tensor with shape [seq_len] for the resulting non-projective parse tree.
Return type:: Tensor

MST#

supar.structs.fn.mst(scores: Tensor, mask: BoolTensor, multiroot: bool = False) → Tensor[source]#

MST algorithm for decoding non-projective trees. This is a wrapper for ChuLiu/Edmonds algorithm.

The algorithm first runs ChuLiu/Edmonds to parse a tree and then have a check of multi-roots, If multiroot=True and there indeed exist multi-roots, the algorithm seeks to find best single-root trees by iterating all possible single-root trees parsed by ChuLiu/Edmonds. Otherwise the resulting trees are directly taken as the final outputs.

Parameters:

scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all dependent-head pairs.
mask (BoolTensor) – [batch_size, seq_len]. The mask to avoid parsing over padding tokens. The first column serving as pseudo words for roots should be False.
multiroot (bool) – Ensures to parse a single-root tree If False.

Returns:

A tensor with shape [batch_size, seq_len] for the resulting non-projective parse trees.

Return type:

Tensor