Find footnotes and anchors and report found sequences#20
Find footnotes and anchors and report found sequences#20cpeel merged 1 commit intoDistributedProofreaders:masterfrom
Conversation
|
Example outputs: One range found: No ranges found: Multiple ranges found: |
3d73c7c to
ccc497f
Compare
Finds footnote anchors (like [1], if not at start of line), and footnotes in a few formats, and creates a report of the sequences found. A sequence is an unbroken increasing numerical series (like 1,2,3,4,5). If the sequence breaks, then a new sequence is started. Report on the sequences found. Two reports are generated, one for anchors and one for footnotes. Limitations: - Currently only understands positive integer values - Understands [n], Footnote n:, [Footnote n:, will not detect any other formats that may be in use by PPers. - Reports found sequences - no attempt is made to compare the sequence of anchors to the sequence of footnotes. This is intentional as it's common for a single footnote to have numerous anchors that all point to it.
|
The ppwb at https://www.pgdp.org/~cpeel/ppwb/ has this copy of |
|
Should the check ignore the Transcriber's Note, if it exists? I guess one question is, is it easy to if the TN is at the beginning of the file instead of the end. Results from one of my test runs: The standalone was a TN for footnote 243, because there was a correction made to the footnote. If that's outside the scope of this PR, that's fine. I'll go ahead and approve so that if y'all are also happy with it as-is, you can merge any time. |
This code isn't aware of what a TN is (and since there's no strictly standard format for what it looks like, it wouldn't be trivial to look for anything more more clever than searching for those two words and the lines after them until e.g. a 4-newline break). It only looks for what I said earlier: anchors that look like |
Finds footnote anchors (like [1], if not at start of line), and footnotes in a few formats, and creates a report of the sequences found. A sequence is an unbroken increasing numerical series (like 1,2,3,4,5). If the sequence breaks, then a new sequence is started. Report on the sequences found. Two reports are generated, one for anchors and one for footnotes.
Limitations:
How to run:
If this has been installed to the TEST sandbox, use via web as you would normally.
If using on command line, you'll need Golang and aspell installed (and aspell will need to be in the expected location,
/usr/bin/aspell. Prep a text file to test, then:After the run, open
report.htmlin a browser and search for "footnote check" to find the result.What is recognized as a footnote or anchor:
Footnote anchors of the form
[n]are recognized, so long as:nis a positive integerFootnotes are recognized as any of the below, but only if they're the first text on the line. Again, only positive integers are supported.
[n]Footnote n:[Footnote n:Possible outcomes in the report:
If no anchors or footnotes are found, the report will say nothing was found, and should be in light gray text.
Otherwise, sequences will be detected and shown. A sequence is any series of numbers that is going up by exactly 1 with each anchor detected. Below are examples of what is seen and the resulting series reported (separated by
-->First anchors are reported, and then the footnotes are reported.