Skip to content

PreprocessingMetadata enhancement #2

@hlibbabii

Description

@hlibbabii
  • Rename PreprocessingMetadata -> PreppedTokenMetadata
  • Represent word_boundaries field as a list of the number of subtoken in each token, e.g
    [1, 3, 1, 2] instead of [0, 1, 4, 5, 7]
  • Remove non-processible tokens filed. Return non-processible tokens as a separate object
  • Provide a method for returning the metadata for the last tokens:
>>> metadata.for_last_tokens(n: int)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions