LSTM Layers
Contents
LSTM Layers#
CharLSTM#
- class supar.modules.lstm.CharLSTM(n_chars: int, n_embed: int, n_hidden: int, n_out: int = 0, pad_index: int = 0, dropout: float = 0)[source]#
CharLSTM aims to generate character-level embeddings for tokens. It summarizes the information of characters in each token to an embedding using a LSTM layer.
- Parameters
n_char (int) – The number of characters.
n_embed (int) – The size of each embedding vector as input to LSTM.
n_hidden (int) – The size of each LSTM hidden state.
n_out (int) – The size of each output vector. Default: 0. If 0, equals to the size of hidden states.
pad_index (int) – The index of the padding token in the vocabulary. Default: 0.
dropout (float) – The dropout ratio of CharLSTM hidden states. Default: 0.
- forward(x: torch.Tensor) torch.Tensor [source]#
VariationalLSTM#
- class supar.modules.lstm.VariationalLSTM(input_size: int, hidden_size: int, num_layers: int = 1, bidirectional: bool = False, dropout: float = 0.0)[source]#
VariationalLSTM Gal & Ghahramani (2016) is an variant of the vanilla bidirectional LSTM adopted by Biaffine Parser with the only difference of the dropout strategy. It drops nodes in the LSTM layers (input and recurrent connections) and applies the same dropout mask at every recurrent timesteps.
APIs are roughly the same as
LSTM
except that we only allowsPackedSequence
as input.- Parameters
input_size (int) – The number of expected features in the input.
hidden_size (int) – The number of features in the hidden state h.
num_layers (int) – The number of recurrent layers. Default: 1.
bidirectional (bool) – If
True
, becomes a bidirectional LSTM. Default:False
dropout (float) – If non-zero, introduces a
SharedDropout
layer on the outputs of each LSTM layer except the last layer. Default: 0.
- forward(sequence: torch.nn.utils.rnn.PackedSequence, hx: Optional[Tuple[torch.Tensor, torch.Tensor]] = None) Tuple[torch.nn.utils.rnn.PackedSequence, Tuple[torch.Tensor, torch.Tensor]] [source]#
- Parameters
sequence (PackedSequence) – A packed variable length sequence.
hx (Tensor, Tensor) – A tuple composed of two tensors h and c. h of shape
[num_layers*num_directions, batch_size, hidden_size]
holds the initial hidden state for each element in the batch. c of shape[num_layers*num_directions, batch_size, hidden_size]
holds the initial cell state for each element in the batch. If hx is not provided, both h and c default to zero. Default:None
.
- Returns
The first is a packed variable length sequence. The second is a tuple of tensors h and c. h of shape
[num_layers*num_directions, batch_size, hidden_size]
holds the hidden state for t=seq_len. Like output, the layers can be separated usingh.view(num_layers, num_directions, batch_size, hidden_size)
and similarly for c. c of shape[num_layers*num_directions, batch_size, hidden_size]
holds the cell state for t=seq_len.- Return type