- class supar.utils.data.Dataset(transform: supar.utils.transform.Transform, data: Union[str, Iterable], cache: bool = False, binarize: bool = False, bin: Optional[str] = None, max_len: Optional[int] = None, **kwargs)#
Dataset that is compatible with
torch.utils.data.Dataset, serving as a wrapper for manipulating all data fields with the operating behaviours defined in
Transform. The data fields of all the instantiated sentences can be accessed as an attribute of the dataset.
data (Union[str, Iterable]) – A filename or a list of instances that will be passed into
cache (bool) – If
True, tries to use the previously cached binarized data for fast loading. In this way, sentences are loaded on-the-fly according to the meta data. If
False, all sentences will be directly loaded into the memory. Default:
binarize (bool) – If
True, binarizes the dataset once building it. Only works if
bin (str) – Path for saving binarized files, required if
max_len (int) – Sentences exceeding the length will be discarded. Default:
kwargs (Dict) – Together with data, kwargs will be passed into
transform.load()to control the loading behaviour.
A list of sentences loaded from the data. Each sentence includes fields obeying the data format defined in
cache=True, each is a pointer to the sentence stored in the cache file.