Skip to content

Conversation

@ericprud
Copy link

For deep lexer rules like ShEx's PN_CHARS_BASE, the emitted rule has an enormous number of capture groups. When parsing a large input like FHIR.shex gives a stack error:

/home/eric/checkouts/shexSpec/shex.js/packages/shex-parser/shex-parser.js:251
      throw errors[0];
      ^

RangeError: Maximum call stack size exceeded
    at String.match (<anonymous>)
    at JisonLexer.next (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:225:37)
    at JisonLexer.lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:269:22)
    at JisonLexer.lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:274:25)
    at lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/parser/lib/parser.js:51:28)
    at JisonParser.parse (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/parser/lib/parser.js:68:30)
    at ShExJisonParser.runParser [as parse] (/home/eric/checkouts/shexSpec/shex.js/packages/shex-parser/shex-parser.js:231:22)
    at Object.<anonymous> (/home/eric/checkouts/shexSpec/shex.js/parseFhir.js:10:38)
    at Module._compile (node:internal/modules/cjs/loader:1119:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1173:10) {
  parsed: null
}

Eliminating capture groups fixes the problem and makes parsing wayyyy faster.

(I generated this grammar using ts-jison, but the same happens with jison.)

For deep lexer rules like [ShEx's PN_CHARS_BASE](https://github.com/shexjs/shex.js/blob/b5cb30708d7c69550a07f1329aaf97cdb8eed737/packages/shex-parser/lib/ShExJison.jison#L255), the [emitted rule](https://github.com/shexjs/shex.js/blob/b5cb30708d7c69550a07f1329aaf97cdb8eed737/packages/shex-parser/lib/ShExJison.js#L934) has an enormous number of capture groups. When parsing a large input like [FHIR.shex](https://hl7.org/fhir/R4B/fhir.schema.shex.zip) gives a stack error:
```
/home/eric/checkouts/shexSpec/shex.js/packages/shex-parser/shex-parser.js:251
      throw errors[0];
      ^

RangeError: Maximum call stack size exceeded
    at String.match (<anonymous>)
    at JisonLexer.next (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:225:37)
    at JisonLexer.lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:269:22)
    at JisonLexer.lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:274:25)
    at lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/parser/lib/parser.js:51:28)
    at JisonParser.parse (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/parser/lib/parser.js:68:30)
    at ShExJisonParser.runParser [as parse] (/home/eric/checkouts/shexSpec/shex.js/packages/shex-parser/shex-parser.js:231:22)
    at Object.<anonymous> (/home/eric/checkouts/shexSpec/shex.js/parseFhir.js:10:38)
    at Module._compile (node:internal/modules/cjs/loader:1119:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1173:10) {
  parsed: null
}
```
Eliminating capture groups fixes the problem and makes parsing wayyyy faster.

(I generated this grammar using ts-jison, but the same happens with jison.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant