Biaffine#
BiaffineSemanticDependencyParser#
- class supar.models.sdp.biaffine.BiaffineSemanticDependencyParser(*args, **kwargs)[source]#
The implementation of Biaffine Semantic Dependency Parser Dozat & Manning (2018).
- MODEL#
alias of
BiaffineSemanticDependencyModel
- train(train: str | Iterable, dev: str | Iterable, test: str | Iterable, epochs: int = 1000, patience: int = 100, batch_size: int = 5000, update_steps: int = 1, buckets: int = 32, workers: int = 0, amp: bool = False, cache: bool = False, verbose: bool = True, **kwargs)[source]#
- Parameters:
train/dev/test (Union[str, Iterable]) – Filenames of the train/dev/test datasets.
epochs (int) – The number of training iterations.
patience (int) – The number of consecutive iterations after which the training process would be early stopped if no improvement.
batch_size (int) – The number of tokens in each batch. Default: 5000.
update_steps (int) – Gradient accumulation steps. Default: 1.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
workers (int) – The number of subprocesses used for data loading. 0 means only the main process. Default: 0.
clip (float) – Clips gradient of an iterable of parameters at specified value. Default: 5.0.
amp (bool) – Specifies whether to use automatic mixed precision. Default:
False
.cache (bool) – If
True
, caches the data first, suggested for huge files (e.g., > 1M sentences). Default:False
.verbose (bool) – If
True
, increases the output verbosity. Default:True
.
- evaluate(data: str | Iterable, batch_size: int = 5000, buckets: int = 8, workers: int = 0, amp: bool = False, cache: bool = False, verbose: bool = True, **kwargs)[source]#
- Parameters:
data (Union[str, Iterable]) – The data for evaluation. Both a filename and a list of instances are allowed.
batch_size (int) – The number of tokens in each batch. Default: 5000.
buckets (int) – The number of buckets that sentences are assigned to. Default: 8.
workers (int) – The number of subprocesses used for data loading. 0 means only the main process. Default: 0.
amp (bool) – Specifies whether to use automatic mixed precision. Default:
False
.cache (bool) – If
True
, caches the data first, suggested for huge files (e.g., > 1M sentences). Default:False
.verbose (bool) – If
True
, increases the output verbosity. Default:True
.
- Returns:
The evaluation results.
- predict(data: str | Iterable, pred: str | None = None, lang: str | None = None, prob: bool = False, batch_size: int = 5000, buckets: int = 8, workers: int = 0, amp: bool = False, cache: bool = False, verbose: bool = True, **kwargs)[source]#
- Parameters:
data (Union[str, Iterable]) – The data for prediction. - a filename. If ends with .txt, the parser will seek to make predictions line by line from plain texts. - a list of instances.
pred (str) – If specified, the predicted results will be saved to the file. Default:
None
.lang (str) – Language code (e.g.,
en
) or language name (e.g.,English
) for the text to tokenize.None
if tokenization is not required. Default:None
.prob (bool) – If
True
, outputs the probabilities. Default:False
.batch_size (int) – The number of tokens in each batch. Default: 5000.
buckets (int) – The number of buckets that sentences are assigned to. Default: 8.
workers (int) – The number of subprocesses used for data loading. 0 means only the main process. Default: 0.
amp (bool) – Specifies whether to use automatic mixed precision. Default:
False
.cache (bool) – If
True
, caches the data first, suggested for huge files (e.g., > 1M sentences). Default:False
.verbose (bool) – If
True
, increases the output verbosity. Default:True
.
- Returns:
A
Dataset
object containing all predictions ifcache=False
, otherwiseNone
.
- classmethod build(path, min_freq=7, fix_len=20, **kwargs)[source]#
Build a brand-new Parser, including initialization of all data fields and model parameters.
- Parameters:
path (str) – The path of the model to be saved.
min_freq (str) – The minimum frequency needed to include a token in the vocabulary. Default:7.
fix_len (int) – The max length of all subword pieces. The excess part of each piece will be truncated. Required if using CharLSTM/BERT. Default: 20.
kwargs (Dict) – A dict holding the unconsumed arguments.
BiaffineSemanticDependencyModel#
- class supar.models.sdp.biaffine.BiaffineSemanticDependencyModel(n_words, n_labels, n_tags=None, n_chars=None, n_lemmas=None, encoder='lstm', feat=['tag', 'char', 'lemma'], n_embed=100, n_pretrained=125, n_feat_embed=100, n_char_embed=50, n_char_hidden=400, char_pad_index=0, char_dropout=0.33, elmo='original_5b', elmo_bos_eos=(True, False), bert=None, n_bert_layers=4, mix_dropout=0.0, bert_pooling='mean', bert_pad_index=0, finetune=False, n_plm_embed=0, embed_dropout=0.2, n_encoder_hidden=1200, n_encoder_layers=3, encoder_dropout=0.33, n_edge_mlp=600, n_label_mlp=600, edge_mlp_dropout=0.25, label_mlp_dropout=0.33, interpolation=0.1, pad_index=0, unk_index=1, **kwargs)[source]#
The implementation of Biaffine Semantic Dependency Parser Dozat & Manning (2018).
- Parameters:
n_words (int) – The size of the word vocabulary.
n_labels (int) – The number of labels in the treebank.
n_tags (int) – The number of POS tags, required if POS tag embeddings are used. Default:
None
.n_chars (int) – The number of characters, required if character-level representations are used. Default:
None
.n_lemmas (int) – The number of lemmas, required if lemma embeddings are used. Default:
None
.encoder (str) – Encoder to use.
'lstm'
: BiLSTM encoder.'bert'
: BERT-like pretrained language model (for finetuning), e.g.,'bert-base-cased'
. Default:'lstm'
.feat (List[str]) – Additional features to use, required if
encoder='lstm'
.'tag'
: POS tag embeddings.'char'
: Character-level representations extracted by CharLSTM.'lemma'
: Lemma embeddings.'bert'
: BERT representations, other pretrained language models like RoBERTa are also feasible. Default: ['tag'
,'char'
,'lemma'
].n_embed (int) – The size of word embeddings. Default: 100.
n_pretrained (int) – The size of pretrained word representations. Default: 125.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if using CharLSTM. Default: 50.
n_char_hidden (int) – The size of hidden states of CharLSTM, required if using CharLSTM. Default: 100.
char_pad_index (int) – The index of the padding token in the character vocabulary, required if using CharLSTM. Default: 0.
elmo (str) – Name of the pretrained ELMo registered in ELMoEmbedding.OPTION. Default:
'original_5b'
.elmo_bos_eos (Tuple[bool]) – A tuple of two boolean values indicating whether to keep start/end boundaries of elmo outputs. Default:
(True, False)
.bert (str) – Specifies which kind of language model to use, e.g.,
'bert-base-cased'
. This is required ifencoder='bert'
or using BERT features. The full list can be found in transformers. Default:None
.n_bert_layers (int) – Specifies how many last layers to use, required if
encoder='bert'
or using BERT features. The final outputs would be weighted sum of the hidden states of these layers. Default: 4.mix_dropout (float) – The dropout ratio of BERT layers, required if
encoder='bert'
or using BERT features. Default: .0.bert_pooling (str) – Pooling way to get token embeddings.
first
: take the first subtoken.last
: take the last subtoken.mean
: take a mean over all. Default:mean
.bert_pad_index (int) – The index of the padding token in BERT vocabulary, required if
encoder='bert'
or using BERT features. Default: 0.finetune (bool) – If
False
, freezes all parameters, required if using pretrained layers. Default:False
.n_plm_embed (int) – The size of PLM embeddings. If 0, uses the size of the pretrained embedding model. Default: 0.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .2.
n_encoder_hidden (int) – The size of encoder hidden states. Default: 1200.
n_encoder_layers (int) – The number of encoder layers. Default: 3.
encoder_dropout (float) – The dropout ratio of encoder layer. Default: .33.
n_edge_mlp (int) – Edge MLP size. Default: 600.
n_label_mlp (int) – Label MLP size. Default: 600.
edge_mlp_dropout (float) – The dropout ratio of edge MLP layers. Default: .25.
label_mlp_dropout (float) – The dropout ratio of label MLP layers. Default: .33.
interpolation (int) – Constant to even out the label/edge loss. Default: .1.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.
- forward(words, feats=None)[source]#
- Parameters:
words (LongTensor) –
[batch_size, seq_len]
. Word indices.feats (List[LongTensor]) – A list of feat indices. The size is either
[batch_size, seq_len, fix_len]
iffeat
is'char'
or'bert'
, or[batch_size, seq_len]
otherwise. Default:None
.
- Returns:
The first tensor of shape
[batch_size, seq_len, seq_len, 2]
holds scores of all possible edges. The second of shape[batch_size, seq_len, seq_len, n_labels]
holds scores of all possible labels on each edge.- Return type:
- loss(s_edge, s_label, labels, mask)[source]#
- Parameters:
s_edge (Tensor) –
[batch_size, seq_len, seq_len, 2]
. Scores of all possible edges.s_label (Tensor) –
[batch_size, seq_len, seq_len, n_labels]
. Scores of all possible labels on each edge.labels (LongTensor) –
[batch_size, seq_len, seq_len]
. The tensor of gold-standard labels.mask (BoolTensor) –
[batch_size, seq_len]
. The mask for covering the unpadded tokens.
- Returns:
The training loss.
- Return type: