Skip to content

Better splits; don't count everything as a word #9

@camilstaps

Description

@camilstaps

At the moment, we only split the text on spaces, but we should also split on ־ and ׃ and possibly more signs.

Furthermore, some things should not be considered a word:

  • פ and ס at the end of a verse
  • ׀ (still unclear to me what this actually means)

And then ketiv-qere should be handled better, e.g. (Ps. 119:161) [וּמִדְּבָרֶיךָ כ] (וּ֝מִדְּבָרְךָ֗ ק) breaks as four words, of which ] and ) are part. The characters [, ], (, ) and כ and ק in this context should not be considered part of a word.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions