Vocab#

Vocab#

class supar.utils.vocab.Vocab(counter: Counter, min_freq: int = 1, specials: Tuple = (), unk_index: int = 0)[source]#

Defines a vocabulary object that will be used to numericalize a field.

Parameters:
  • counter (Counter) – Counter object holding the frequencies of each value found in the data.

  • min_freq (int) – The minimum frequency needed to include a token in the vocabulary. Default: 1.

  • specials (Tuple[str]) – The list of special tokens (e.g., pad, unk, bos and eos) that will be prepended to the vocabulary. Default: [].

  • unk_index (int) – The index of unk token. Default: 0.

itos#

A list of token strings indexed by their numerical identifiers.

stoi#

A defaultdict object mapping token strings to numerical identifiers.