Data#
Dataset#
- class supar.utils.data.Dataset(transform: Transform, data: str | Iterable, cache: bool = False, binarize: bool = False, bin: str | None = None, max_len: int | None = None, **kwargs)[source]#
- Dataset that is compatible with - torch.utils.data.Dataset, serving as a wrapper for manipulating all data fields with the operating behaviours defined in- Transform. The data fields of all the instantiated sentences can be accessed as an attribute of the dataset.- Parameters:
- transform (Transform) – An instance of - Transformor its derivations. The instance holds a series of loading and processing behaviours with regard to the specific data format.
- data (Union[str, Iterable]) – A filename or a list of instances that will be passed into - transform.load().
- cache (bool) – If - True, tries to use the previously cached binarized data for fast loading. In this way, sentences are loaded on-the-fly according to the meta data. If- False, all sentences will be directly loaded into the memory. Default:- False.
- binarize (bool) – If - True, binarizes the dataset once building it. Only works if- cache=True. Default:- False.
- bin (str) – Path to binarized files, required if - cache=True. Default:- None.
- max_len (int) – Sentences exceeding the length will be discarded. Default: - None.
- kwargs (Dict) – Together with data, kwargs will be passed into - transform.load()to control the loading behaviour.
 
 - sentences#
- A list of sentences loaded from the data. Each sentence includes fields obeying the data format defined in - transform. If- cache=True, each is a pointer to the sentence stored in the cache file.- Type:
- List[Sentence]