Data#
Dataset#
- class supar.utils.data.Dataset(transform: Transform, data: str | Iterable, cache: bool = False, binarize: bool = False, bin: str | None = None, max_len: int | None = None, **kwargs)[source]#
 Dataset that is compatible with
torch.utils.data.Dataset, serving as a wrapper for manipulating all data fields with the operating behaviours defined inTransform. The data fields of all the instantiated sentences can be accessed as an attribute of the dataset.- Parameters:
 transform (Transform) – An instance of
Transformor its derivations. The instance holds a series of loading and processing behaviours with regard to the specific data format.data (Union[str, Iterable]) – A filename or a list of instances that will be passed into
transform.load().cache (bool) – If
True, tries to use the previously cached binarized data for fast loading. In this way, sentences are loaded on-the-fly according to the meta data. IfFalse, all sentences will be directly loaded into the memory. Default:False.binarize (bool) – If
True, binarizes the dataset once building it. Only works ifcache=True. Default:False.bin (str) – Path to binarized files, required if
cache=True. Default:None.max_len (int) – Sentences exceeding the length will be discarded. Default:
None.kwargs (Dict) – Together with data, kwargs will be passed into
transform.load()to control the loading behaviour.
- sentences#
 A list of sentences loaded from the data. Each sentence includes fields obeying the data format defined in
transform. Ifcache=True, each is a pointer to the sentence stored in the cache file.- Type:
 List[Sentence]