Skip to content

Find footnotes and anchors and report found sequences#20

Merged
cpeel merged 1 commit intoDistributedProofreaders:masterfrom
tangledhelix:fncount
Feb 8, 2026
Merged

Find footnotes and anchors and report found sequences#20
cpeel merged 1 commit intoDistributedProofreaders:masterfrom
tangledhelix:fncount

Conversation

@tangledhelix
Copy link
Member

Finds footnote anchors (like [1], if not at start of line), and footnotes in a few formats, and creates a report of the sequences found. A sequence is an unbroken increasing numerical series (like 1,2,3,4,5). If the sequence breaks, then a new sequence is started. Report on the sequences found. Two reports are generated, one for anchors and one for footnotes.

Limitations:

  • Currently only understands positive integer values
  • Understands [n], Footnote n:, [Footnote n:, will not detect any other formats that may be in use by PPers.
  • Reports found sequences - no attempt is made to compare the sequence of anchors to the sequence of footnotes. This is intentional as it's common for a single footnote to have numerous anchors that all point to it.

How to run:

If this has been installed to the TEST sandbox, use via web as you would normally.

If using on command line, you'll need Golang and aspell installed (and aspell will need to be in the expected location, /usr/bin/aspell. Prep a text file to test, then:

go build pptext.go
./pptext -i testfile.txt

After the run, open report.html in a browser and search for "footnote check" to find the result.

What is recognized as a footnote or anchor:

Footnote anchors of the form [n] are recognized, so long as:

  • n is a positive integer
  • it's not the first text on a line (because then it's a footnote, not an anchor)

Footnotes are recognized as any of the below, but only if they're the first text on the line. Again, only positive integers are supported.

  • [n]
  • Footnote n:
  • [Footnote n:

Possible outcomes in the report:

If no anchors or footnotes are found, the report will say nothing was found, and should be in light gray text.

Otherwise, sequences will be detected and shown. A sequence is any series of numbers that is going up by exactly 1 with each anchor detected. Below are examples of what is seen and the resulting series reported (separated by -->

1, 2, 3, 4, 5, 6, 7, 8 --> 1-8                (total 8)
1, 2, 3, 4, 5, 1, 2, 3 --> 1-5, 1-3           (total 8)
1, 1, 1, 1, 1, 1,      --> 1, 1, 1, 1, 1, 1   (total 6)
1, 2, 3, 2, 3, 2, 2    --> 1-3, 2-3, 2, 2     (total 7)

First anchors are reported, and then the footnotes are reported.

@tangledhelix
Copy link
Member Author

Example outputs:

One range found:

----- footnote check ---------------------------------------------------------

found footnote anchors: 1–8 (count: 8)

found footnotes: 1–8 (count: 8)

No ranges found:

----- footnote check ---------------------------------------------------------

no footnotes or anchors found.

Multiple ranges found:

----- footnote check ---------------------------------------------------------

found footnote anchors:
    1–2
    1–2
    1–4
(total count: 8)

found footnotes:
    1–2
    1–2
    1–4
(total count: 8)

@tangledhelix tangledhelix force-pushed the fncount branch 2 times, most recently from 3d73c7c to ccc497f Compare February 5, 2026 22:49
Finds footnote anchors (like [1], if not at start of line), and
footnotes in a few formats, and creates a report of the sequences found.
A sequence is an unbroken increasing numerical series (like 1,2,3,4,5).
If the sequence breaks, then a new sequence is started. Report on the
sequences found. Two reports are generated, one for anchors and one for
footnotes.

Limitations:
- Currently only understands positive integer values
- Understands [n], Footnote n:, [Footnote n:, will not detect any other
formats that may be in use by PPers.
- Reports found sequences - no attempt is made to compare the sequence
of anchors to the sequence of footnotes. This is intentional as it's
common for a single footnote to have numerous anchors that all point to
it.
@cpeel
Copy link
Member

cpeel commented Feb 7, 2026

The ppwb at https://www.pgdp.org/~cpeel/ppwb/ has this copy of pptext in it.

@srjfoo
Copy link
Member

srjfoo commented Feb 7, 2026

Should the check ignore the Transcriber's Note, if it exists? I guess one question is, is it easy to if the TN is at the beginning of the file instead of the end.

Results from one of my test runs:

----- footnote check ----------- ...

found footnote anchors:
    1–9
    1–279
    243
(total count: 289)

found footnotes:
    1–9
    1–279
(total count: 288)

The standalone was a TN for footnote 243, because there was a correction made to the footnote.

If that's outside the scope of this PR, that's fine. I'll go ahead and approve so that if y'all are also happy with it as-is, you can merge any time.

@tangledhelix
Copy link
Member Author

Should the check ignore the Transcriber's Note, if it exists? I guess one question is, is it easy to if the TN is at the beginning of the file instead of the end.

This code isn't aware of what a TN is (and since there's no strictly standard format for what it looks like, it wouldn't be trivial to look for anything more more clever than searching for those two words and the lines after them until e.g. a 4-newline break).

It only looks for what I said earlier: anchors that look like word[1] and footnotes, at the start of a line, like [1] or Footnote 1: or [Footnote 1:. It has no understanding of any structure beyond that, and since text files are much less structured that HTML, I'd need to get a little clever to try to detect structure. So I don't intend to try that.

@cpeel cpeel merged commit 45f542c into DistributedProofreaders:master Feb 8, 2026
@tangledhelix tangledhelix deleted the fncount branch February 8, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants