At the moment, we only split the text on spaces, but we should also split on ־ and ׃ and possibly more signs.
Furthermore, some things should not be considered a word:
- פ and ס at the end of a verse
- ׀ (still unclear to me what this actually means)
And then ketiv-qere should be handled better, e.g. (Ps. 119:161) [וּמִדְּבָרֶיךָ כ] (וּ֝מִדְּבָרְךָ֗ ק) breaks as four words, of which ] and ) are part. The characters [, ], (, ) and כ and ק in this context should not be considered part of a word.