Tree#

MatrixTree#

class supar.structs.tree.MatrixTree(scores: torch.Tensor, lens: Optional[torch.LongTensor] = None, multiroot: bool = False)[source]#

MatrixTree for calculating partitions and marginals of non-projective dependency trees in \(O(n^3)\) by an adaptation of Kirchhoff’s MatrixTree Theorem Koo et al. (2007).

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

  • multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import MatrixTree
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> s1 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
tensor([0.7174, 3.7910], grad_fn=<SumBackward1>)
>>> s1.argmax
tensor([[0, 0, 1, 1, 0],
        [0, 4, 1, 0, 3]])
>>> s1.log_partition
tensor([2.0229, 6.0558], grad_fn=<CopyBackwards>)
>>> s1.log_prob(arcs)
tensor([-3.2209, -2.5756], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([1.9711, 3.4497], grad_fn=<SubBackward0>)
>>> s1.kl(s2)
tensor([1.3354, 2.6914], grad_fn=<AddBackward0>)
property max#

Computes the max score of the distribution \(p(y)\).

property argmax#

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

kmax(k: int) torch.Tensor[source]#

Computes the k-max of the distribution \(p(y)\).

sample()[source]#

Obtains a structured sample from the distribution \(y \sim p(y)\). TODO: multi-sampling.

property entropy#

Computes entropy \(H[p]\) of the distribution \(p(y)\).

cross_entropy(other: supar.structs.tree.MatrixTree) torch.Tensor[source]#

Computes cross-entropy \(H[p,q]\) of self and another distribution.

Parameters

other (StructuredDistribution) – Comparison distribution.

kl(other: supar.structs.tree.MatrixTree) torch.Tensor[source]#

Computes KL-divergence \(KL[p \parallel q]=H[p,q]-H[p]\) of self and another distribution.

Parameters

other (StructuredDistribution) – Comparison distribution.

DependencyCRF#

class supar.structs.tree.DependencyCRF(scores: torch.Tensor, lens: Optional[torch.LongTensor] = None, multiroot: bool = False)[source]#

First-order TreeCRF for projective dependency trees Eisner 2000,Zhang et al. (2020a).

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

  • multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import DependencyCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> s1 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
tensor([3.6346, 1.7194], grad_fn=<IndexBackward>)
>>> s1.argmax
tensor([[0, 2, 3, 0, 0],
        [0, 0, 3, 1, 1]])
>>> s1.log_partition
tensor([4.1007, 3.3383], grad_fn=<IndexBackward>)
>>> s1.log_prob(arcs)
tensor([-1.3866, -5.5352], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([0.9979, 2.6056], grad_fn=<IndexBackward>)
>>> s1.kl(s2)
tensor([1.6631, 2.6558], grad_fn=<IndexBackward>)
property argmax#

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k: int) torch.LongTensor[source]#

Computes the k-argmax of the distribution \(p(y)\).

Dependency2oCRF#

class supar.structs.tree.Dependency2oCRF(scores: Tuple[torch.Tensor, torch.Tensor], lens: Optional[torch.LongTensor] = None, multiroot: bool = False)[source]#

Second-order TreeCRF for projective dependency trees McDonald & Pereira 2006,Zhang et al. (2020a).

Parameters
  • scores (tuple(Tensor, Tensor)) – Scores of all possible dependent-head pairs ([batch_size, seq_len, seq_len]) and dependent-head-sibling triples [batch_size, seq_len, seq_len, seq_len].

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

  • multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import Dependency2oCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> sibs = torch.tensor([CoNLL.get_sibs(i) for i in arcs[:, 1:].tolist()])
>>> s1 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len),
                          torch.randn(batch_size, seq_len, seq_len, seq_len)),
                         lens)
>>> s2 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len),
                          torch.randn(batch_size, seq_len, seq_len, seq_len)),
                         lens)
>>> s1.max
tensor([0.7574, 3.3634], grad_fn=<IndexBackward>)
>>> s1.argmax
tensor([[0, 3, 3, 0, 0],
        [0, 4, 4, 4, 0]])
>>> s1.log_partition
tensor([1.9906, 4.3599], grad_fn=<IndexBackward>)
>>> s1.log_prob((arcs, sibs))
tensor([-0.6975, -6.2845], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([1.6436, 2.1717], grad_fn=<IndexBackward>)
>>> s1.kl(s2)
tensor([0.4929, 2.0759], grad_fn=<IndexBackward>)
property argmax#

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k: int) torch.LongTensor[source]#

Computes the k-argmax of the distribution \(p(y)\).

ConstituencyCRF#

class supar.structs.tree.ConstituencyCRF(scores: torch.Tensor, lens: Optional[torch.LongTensor] = None, label: bool = False)[source]#

Constituency TreeCRF Zhang et al. 2020b,Stern et al. (2017).

Parameters
  • scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all constituents.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking.

Examples

>>> from supar import ConstituencyCRF
>>> batch_size, seq_len, n_labels = 2, 5, 4
>>> lens = torch.tensor([3, 4])
>>> charts = torch.tensor([[[-1,  0, -1,  0, -1],
                            [-1, -1,  0,  0, -1],
                            [-1, -1, -1,  0, -1],
                            [-1, -1, -1, -1, -1],
                            [-1, -1, -1, -1, -1]],
                           [[-1,  0,  0, -1,  0],
                            [-1, -1,  0, -1, -1],
                            [-1, -1, -1,  0,  0],
                            [-1, -1, -1, -1,  0],
                            [-1, -1, -1, -1, -1]]])
>>> s1 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len, n_labels), lens, True)
>>> s2 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len, n_labels), lens, True)
>>> s1.max
tensor([3.7036, 7.2569], grad_fn=<IndexBackward0>)
>>> s1.argmax
[[[0, 1, 2], [0, 3, 0], [1, 2, 1], [1, 3, 0], [2, 3, 3]],
 [[0, 1, 1], [0, 4, 2], [1, 2, 3], [1, 4, 1], [2, 3, 2], [2, 4, 3], [3, 4, 3]]]
>>> s1.log_partition
tensor([ 8.5394, 12.9940], grad_fn=<IndexBackward0>)
>>> s1.log_prob(charts)
tensor([ -8.5209, -14.1160], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([6.8868, 9.3996], grad_fn=<IndexBackward0>)
>>> s1.kl(s2)
tensor([4.0039, 4.1037], grad_fn=<IndexBackward0>)
property argmax#

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k: int) List[List[Tuple]][source]#

Computes the k-argmax of the distribution \(p(y)\).

BiLexicalizedConstituencyCRF#

class supar.structs.tree.BiLexicalizedConstituencyCRF(scores: List[torch.Tensor], lens: Optional[torch.LongTensor] = None)[source]#

Grammarless Eisner-Satta Algorithm Eisner & Satta 1999,Yang et al. (2021).

Code is revised from Songlin Yang’s implementation.

Parameters
  • scores (Tensor) – [2, batch_size, seq_len, seq_len]. Scores of dependencies and constituents.

  • lens (LongTensor) – [batch_size]. Sentence lengths for masking.

Examples

>>> from supar import BiLexicalizedConstituencyCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> deps = torch.tensor([[0, 0, 1, 1, 0], [0, 3, 1, 0, 3]])
>>> cons = torch.tensor([[[0, 1, 1, 1, 0],
                          [0, 0, 1, 0, 0],
                          [0, 0, 0, 1, 0],
                          [0, 0, 0, 0, 0],
                          [0, 0, 0, 0, 0]],
                         [[0, 1, 1, 1, 1],
                          [0, 0, 1, 0, 0],
                          [0, 0, 0, 1, 0],
                          [0, 0, 0, 0, 1],
                          [0, 0, 0, 0, 0]]]).bool()
>>> heads = torch.tensor([[[0, 1, 1, 1, 0],
                           [0, 0, 2, 0, 0],
                           [0, 0, 0, 3, 0],
                           [0, 0, 0, 0, 0],
                           [0, 0, 0, 0, 0]],
                          [[0, 1, 1, 3, 3],
                           [0, 0, 2, 0, 0],
                           [0, 0, 0, 3, 0],
                           [0, 0, 0, 0, 4],
                           [0, 0, 0, 0, 0]]])
>>> s1 = BiLexicalizedConstituencyCRF((torch.randn(batch_size, seq_len, seq_len),
                                       torch.randn(batch_size, seq_len, seq_len),
                                       torch.randn(batch_size, seq_len, seq_len, seq_len)),
                                      lens)
>>> s2 = BiLexicalizedConstituencyCRF((torch.randn(batch_size, seq_len, seq_len),
                                       torch.randn(batch_size, seq_len, seq_len),
                                       torch.randn(batch_size, seq_len, seq_len, seq_len)),
                                      lens)
>>> s1.max
tensor([0.5792, 2.1737], grad_fn=<MaxBackward0>)
>>> s1.argmax[0]
tensor([[0, 3, 1, 0, 0],
        [0, 4, 1, 1, 0]])
>>> s1.argmax[1]
[[[0, 3], [0, 2], [0, 1], [1, 2], [2, 3]], [[0, 4], [0, 3], [0, 2], [0, 1], [1, 2], [2, 3], [3, 4]]]
>>> s1.log_partition
tensor([1.1923, 3.2343], grad_fn=<LogsumexpBackward>)
>>> s1.log_prob((deps, cons, heads))
tensor([-1.9123, -3.6127], grad_fn=<SubBackward0>)
>>> s1.entropy
tensor([1.3376, 2.2996], grad_fn=<SelectBackward>)
>>> s1.kl(s2)
tensor([1.0617, 2.7839], grad_fn=<SelectBackward>)
property argmax#

Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).

topk(k: int) Tuple[torch.LongTensor, List[List[Tuple]]][source]#

Computes the k-argmax of the distribution \(p(y)\).