Tree#
MatrixTree#
- class supar.structs.tree.MatrixTree(scores: Tensor, lens: LongTensor | None = None, multiroot: bool = False)[source]#
MatrixTree for calculating partitions and marginals of non-projective dependency trees in \(O(n^3)\) by an adaptation of Kirchhoff’s MatrixTree Theorem Koo et al. (2007).
- Parameters:
Examples
>>> from supar import MatrixTree >>> batch_size, seq_len = 2, 5 >>> lens = torch.tensor([3, 4]) >>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]]) >>> s1 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens) >>> s2 = MatrixTree(torch.randn(batch_size, seq_len, seq_len), lens) >>> s1.max tensor([0.7174, 3.7910], grad_fn=<SumBackward1>) >>> s1.argmax tensor([[0, 0, 1, 1, 0], [0, 4, 1, 0, 3]]) >>> s1.log_partition tensor([2.0229, 6.0558], grad_fn=<CopyBackwards>) >>> s1.log_prob(arcs) tensor([-3.2209, -2.5756], grad_fn=<SubBackward0>) >>> s1.entropy tensor([1.9711, 3.4497], grad_fn=<SubBackward0>) >>> s1.kl(s2) tensor([1.3354, 2.6914], grad_fn=<AddBackward0>)
- property max#
Computes the max score of the distribution \(p(y)\).
- property argmax#
Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).
- sample()[source]#
Obtains a structured sample from the distribution \(y \sim p(y)\). TODO: multi-sampling.
- property entropy#
Computes entropy \(H[p]\) of the distribution \(p(y)\).
- cross_entropy(other: MatrixTree) Tensor [source]#
Computes cross-entropy \(H[p,q]\) of self and another distribution.
- Parameters:
other (StructuredDistribution) – Comparison distribution.
- kl(other: MatrixTree) Tensor [source]#
Computes KL-divergence \(KL[p \parallel q]=H[p,q]-H[p]\) of self and another distribution.
- Parameters:
other (StructuredDistribution) – Comparison distribution.
DependencyCRF#
- class supar.structs.tree.DependencyCRF(scores: Tensor, lens: LongTensor | None = None, multiroot: bool = False)[source]#
First-order TreeCRF for projective dependency trees Eisner 2000,Zhang et al. (2020a).
- Parameters:
Examples
>>> from supar import DependencyCRF >>> batch_size, seq_len = 2, 5 >>> lens = torch.tensor([3, 4]) >>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]]) >>> s1 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens) >>> s2 = DependencyCRF(torch.randn(batch_size, seq_len, seq_len), lens) >>> s1.max tensor([3.6346, 1.7194], grad_fn=<IndexBackward>) >>> s1.argmax tensor([[0, 2, 3, 0, 0], [0, 0, 3, 1, 1]]) >>> s1.log_partition tensor([4.1007, 3.3383], grad_fn=<IndexBackward>) >>> s1.log_prob(arcs) tensor([-1.3866, -5.5352], grad_fn=<SubBackward0>) >>> s1.entropy tensor([0.9979, 2.6056], grad_fn=<IndexBackward>) >>> s1.kl(s2) tensor([1.6631, 2.6558], grad_fn=<IndexBackward>)
- property argmax#
Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).
Dependency2oCRF#
- class supar.structs.tree.Dependency2oCRF(scores: Tuple[Tensor, Tensor], lens: LongTensor | None = None, multiroot: bool = False)[source]#
Second-order TreeCRF for projective dependency trees McDonald & Pereira 2006,Zhang et al. (2020a).
- Parameters:
scores (tuple(Tensor, Tensor)) – Scores of all possible dependent-head pairs (
[batch_size, seq_len, seq_len]
) and dependent-head-sibling triples[batch_size, seq_len, seq_len, seq_len]
.lens (LongTensor) –
[batch_size]
. Sentence lengths for masking, regardless of root positions. Default:None
.multiroot (bool) – If
False
, requires the tree to contain only a single root. Default:True
.
Examples
>>> from supar import Dependency2oCRF >>> batch_size, seq_len = 2, 5 >>> lens = torch.tensor([3, 4]) >>> arcs = torch.tensor([[0, 2, 0, 4, 2], [0, 3, 1, 0, 3]]) >>> sibs = torch.tensor([CoNLL.get_sibs(i) for i in arcs[:, 1:].tolist()]) >>> s1 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len), torch.randn(batch_size, seq_len, seq_len, seq_len)), lens) >>> s2 = Dependency2oCRF((torch.randn(batch_size, seq_len, seq_len), torch.randn(batch_size, seq_len, seq_len, seq_len)), lens) >>> s1.max tensor([0.7574, 3.3634], grad_fn=<IndexBackward>) >>> s1.argmax tensor([[0, 3, 3, 0, 0], [0, 4, 4, 4, 0]]) >>> s1.log_partition tensor([1.9906, 4.3599], grad_fn=<IndexBackward>) >>> s1.log_prob((arcs, sibs)) tensor([-0.6975, -6.2845], grad_fn=<SubBackward0>) >>> s1.entropy tensor([1.6436, 2.1717], grad_fn=<IndexBackward>) >>> s1.kl(s2) tensor([0.4929, 2.0759], grad_fn=<IndexBackward>)
- property argmax#
Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).
ConstituencyCRF#
- class supar.structs.tree.ConstituencyCRF(scores: Tensor, lens: LongTensor | None = None, label: bool = False)[source]#
Constituency TreeCRF Zhang et al. 2020b,Stern et al. (2017).
- Parameters:
scores (Tensor) –
[batch_size, seq_len, seq_len]
. Scores of all constituents.lens (LongTensor) –
[batch_size]
. Sentence lengths for masking.
Examples
>>> from supar import ConstituencyCRF >>> batch_size, seq_len, n_labels = 2, 5, 4 >>> lens = torch.tensor([3, 4]) >>> charts = torch.tensor([[[-1, 0, -1, 0, -1], [-1, -1, 0, 0, -1], [-1, -1, -1, 0, -1], [-1, -1, -1, -1, -1], [-1, -1, -1, -1, -1]], [[-1, 0, 0, -1, 0], [-1, -1, 0, -1, -1], [-1, -1, -1, 0, 0], [-1, -1, -1, -1, 0], [-1, -1, -1, -1, -1]]]) >>> s1 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len, n_labels), lens, True) >>> s2 = ConstituencyCRF(torch.randn(batch_size, seq_len, seq_len, n_labels), lens, True) >>> s1.max tensor([3.7036, 7.2569], grad_fn=<IndexBackward0>) >>> s1.argmax [[[0, 1, 2], [0, 3, 0], [1, 2, 1], [1, 3, 0], [2, 3, 3]], [[0, 1, 1], [0, 4, 2], [1, 2, 3], [1, 4, 1], [2, 3, 2], [2, 4, 3], [3, 4, 3]]] >>> s1.log_partition tensor([ 8.5394, 12.9940], grad_fn=<IndexBackward0>) >>> s1.log_prob(charts) tensor([ -8.5209, -14.1160], grad_fn=<SubBackward0>) >>> s1.entropy tensor([6.8868, 9.3996], grad_fn=<IndexBackward0>) >>> s1.kl(s2) tensor([4.0039, 4.1037], grad_fn=<IndexBackward0>)
- property argmax#
Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).
BiLexicalizedConstituencyCRF#
- class supar.structs.tree.BiLexicalizedConstituencyCRF(scores: List[Tensor], lens: LongTensor | None = None)[source]#
Grammarless Eisner-Satta Algorithm Eisner & Satta 1999,Yang et al. (2021).
Code is revised from Songlin Yang’s implementation.
- Parameters:
scores (Tensor) –
[2, batch_size, seq_len, seq_len]
. Scores of dependencies and constituents.lens (LongTensor) –
[batch_size]
. Sentence lengths for masking.
Examples
>>> from supar import BiLexicalizedConstituencyCRF >>> batch_size, seq_len = 2, 5 >>> lens = torch.tensor([3, 4]) >>> deps = torch.tensor([[0, 0, 1, 1, 0], [0, 3, 1, 0, 3]]) >>> cons = torch.tensor([[[0, 1, 1, 1, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]], [[0, 1, 1, 1, 1], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1], [0, 0, 0, 0, 0]]]).bool() >>> heads = torch.tensor([[[0, 1, 1, 1, 0], [0, 0, 2, 0, 0], [0, 0, 0, 3, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]], [[0, 1, 1, 3, 3], [0, 0, 2, 0, 0], [0, 0, 0, 3, 0], [0, 0, 0, 0, 4], [0, 0, 0, 0, 0]]]) >>> s1 = BiLexicalizedConstituencyCRF((torch.randn(batch_size, seq_len, seq_len), torch.randn(batch_size, seq_len, seq_len), torch.randn(batch_size, seq_len, seq_len, seq_len)), lens) >>> s2 = BiLexicalizedConstituencyCRF((torch.randn(batch_size, seq_len, seq_len), torch.randn(batch_size, seq_len, seq_len), torch.randn(batch_size, seq_len, seq_len, seq_len)), lens) >>> s1.max tensor([0.5792, 2.1737], grad_fn=<MaxBackward0>) >>> s1.argmax[0] tensor([[0, 3, 1, 0, 0], [0, 4, 1, 1, 0]]) >>> s1.argmax[1] [[[0, 3], [0, 2], [0, 1], [1, 2], [2, 3]], [[0, 4], [0, 3], [0, 2], [0, 1], [1, 2], [2, 3], [3, 4]]] >>> s1.log_partition tensor([1.1923, 3.2343], grad_fn=<LogsumexpBackward>) >>> s1.log_prob((deps, cons, heads)) tensor([-1.9123, -3.6127], grad_fn=<SubBackward0>) >>> s1.entropy tensor([1.3376, 2.2996], grad_fn=<SelectBackward>) >>> s1.kl(s2) tensor([1.0617, 2.7839], grad_fn=<SelectBackward>)
- property argmax#
Computes \(\arg\max_y p(y)\) of the distribution \(p(y)\).