# Tree#

## MatrixTree#

class supar.structs.tree.MatrixTree(scores: Tensor, lens: LongTensor | None = None, multiroot: bool = False)[source]#

MatrixTree for calculating partitions and marginals of non-projective dependency trees in $$O(n^3)$$ by an adaptation of Kirchhoff’s MatrixTree Theorem Koo et al. (2007).

Parameters:
• scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

• lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

• multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import MatrixTree
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> s1 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
>>> s1.argmax
tensor([[0, 0, 1, 1, 0],
[0, 4, 1, 0, 3]])
>>> s1.log_partition
>>> s1.log_prob(arcs)
>>> s1.entropy
>>> s1.kl(s2)

property max#

Computes the max score of the distribution $$p(y)$$.

property argmax#

Computes $$\arg\max_y p(y)$$ of the distribution $$p(y)$$.

kmax(k: int) [source]#

Computes the k-max of the distribution $$p(y)$$.

sample()[source]#

Obtains a structured sample from the distribution $$y \sim p(y)$$. TODO: multi-sampling.

property entropy#

Computes entropy $$H[p]$$ of the distribution $$p(y)$$.

cross_entropy(other: MatrixTree) [source]#

Computes cross-entropy $$H[p,q]$$ of self and another distribution.

Parameters:

other (StructuredDistribution) – Comparison distribution.

kl(other: MatrixTree) [source]#

Computes KL-divergence $$KL[p \parallel q]=H[p,q]-H[p]$$ of self and another distribution.

Parameters:

other (StructuredDistribution) – Comparison distribution.

## DependencyCRF#

class supar.structs.tree.DependencyCRF(scores: Tensor, lens: LongTensor | None = None, multiroot: bool = False)[source]#

First-order TreeCRF for projective dependency trees Eisner 2000,Zhang et al. (2020a).

Parameters:
• scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible dependent-head pairs.

• lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

• multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import DependencyCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> s1 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s2 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens)
>>> s1.max
>>> s1.argmax
tensor([[0, 2, 3, 0, 0],
[0, 0, 3, 1, 1]])
>>> s1.log_partition
>>> s1.log_prob(arcs)
>>> s1.entropy
>>> s1.kl(s2)

property argmax#

Computes $$\arg\max_y p(y)$$ of the distribution $$p(y)$$.

topk(k: int) LongTensor[source]#

Computes the k-argmax of the distribution $$p(y)$$.

## Dependency2oCRF#

class supar.structs.tree.Dependency2oCRF(scores: , lens: LongTensor | None = None, multiroot: bool = False)[source]#

Second-order TreeCRF for projective dependency trees McDonald & Pereira 2006,Zhang et al. (2020a).

Parameters:
• scores (tuple(Tensor, Tensor)) – Scores of all possible dependent-head pairs ([batch_size, seq_len, seq_len]) and dependent-head-sibling triples [batch_size, seq_len, seq_len, seq_len].

• lens (LongTensor) – [batch_size]. Sentence lengths for masking, regardless of root positions. Default: None.

• multiroot (bool) – If False, requires the tree to contain only a single root. Default: True.

Examples

>>> from supar import Dependency2oCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]])
>>> sibs = torch.tensor([CoNLL.get_sibs(i) for i in arcs[:, 1:].tolist()])
>>> s1 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len),
torch.randn(batch_size, seq_len, seq_len, seq_len)),
lens)
>>> s2 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len),
torch.randn(batch_size, seq_len, seq_len, seq_len)),
lens)
>>> s1.max
>>> s1.argmax
tensor([[0, 3, 3, 0, 0],
[0, 4, 4, 4, 0]])
>>> s1.log_partition
>>> s1.log_prob((arcs, sibs))
>>> s1.entropy
>>> s1.kl(s2)

property argmax#

Computes $$\arg\max_y p(y)$$ of the distribution $$p(y)$$.

topk(k: int) LongTensor[source]#

Computes the k-argmax of the distribution $$p(y)$$.

## ConstituencyCRF#

class supar.structs.tree.ConstituencyCRF(scores: Tensor, lens: LongTensor | None = None, label: bool = False)[source]#

Constituency TreeCRF Zhang et al. 2020b,Stern et al. (2017).

Parameters:
• scores (Tensor) – [batch_size, seq_len, seq_len]. Scores of all constituents.

• lens (LongTensor) – [batch_size]. Sentence lengths for masking.

Examples

>>> from supar import ConstituencyCRF
>>> batch_size, seq_len, n_labels = 2, 5, 4
>>> lens = torch.tensor([3, 4])
>>> charts = torch.tensor([[[-1,  0, -1,  0, -1],
[-1, -1,  0,  0, -1],
[-1, -1, -1,  0, -1],
[-1, -1, -1, -1, -1],
[-1, -1, -1, -1, -1]],
[[-1,  0,  0, -1,  0],
[-1, -1,  0, -1, -1],
[-1, -1, -1,  0,  0],
[-1, -1, -1, -1,  0],
[-1, -1, -1, -1, -1]]])
>>> s1 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len, n_labels), lens, True)
>>> s2 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len, n_labels), lens, True)
>>> s1.max
>>> s1.argmax
[[[0, 1, 2], [0, 3, 0], [1, 2, 1], [1, 3, 0], [2, 3, 3]],
[[0, 1, 1], [0, 4, 2], [1, 2, 3], [1, 4, 1], [2, 3, 2], [2, 4, 3], [3, 4, 3]]]
>>> s1.log_partition
>>> s1.log_prob(charts)
>>> s1.entropy
>>> s1.kl(s2)

property argmax#

Computes $$\arg\max_y p(y)$$ of the distribution $$p(y)$$.

topk(k: int) List[List[Tuple]][source]#

Computes the k-argmax of the distribution $$p(y)$$.

## BiLexicalizedConstituencyCRF#

class supar.structs.tree.BiLexicalizedConstituencyCRF(scores: , lens: LongTensor | None = None)[source]#

Grammarless Eisner-Satta Algorithm Eisner & Satta 1999,Yang et al. (2021).

Code is revised from Songlin Yang’s implementation.

Parameters:
• scores (Tensor) – [2, batch_size, seq_len, seq_len]. Scores of dependencies and constituents.

• lens (LongTensor) – [batch_size]. Sentence lengths for masking.

Examples

>>> from supar import BiLexicalizedConstituencyCRF
>>> batch_size, seq_len = 2, 5
>>> lens = torch.tensor([3, 4])
>>> deps = torch.tensor([[0, 0, 1, 1, 0], [0, 3, 1, 0, 3]])
>>> cons = torch.tensor([[[0, 1, 1, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 1, 1, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0]]]).bool()
>>> heads = torch.tensor([[[0, 1, 1, 1, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
[[0, 1, 1, 3, 3],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4],
[0, 0, 0, 0, 0]]])
>>> s1 = BiLexicalizedConstituencyCRF((torch.randn(batch_size, seq_len, seq_len),
torch.randn(batch_size, seq_len, seq_len),
torch.randn(batch_size, seq_len, seq_len, seq_len)),
lens)
>>> s2 = BiLexicalizedConstituencyCRF((torch.randn(batch_size, seq_len, seq_len),
torch.randn(batch_size, seq_len, seq_len),
torch.randn(batch_size, seq_len, seq_len, seq_len)),
lens)
>>> s1.max
>>> s1.argmax[0]
tensor([[0, 3, 1, 0, 0],
[0, 4, 1, 1, 0]])
>>> s1.argmax[1]
[[[0, 3], [0, 2], [0, 1], [1, 2], [2, 3]], [[0, 4], [0, 3], [0, 2], [0, 1], [1, 2], [2, 3], [3, 4]]]
>>> s1.log_partition

Computes $$\arg\max_y p(y)$$ of the distribution $$p(y)$$.
Computes the k-argmax of the distribution $$p(y)$$.