-
-
Notifications
You must be signed in to change notification settings - Fork 90
DRAFT: test the coregex library #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Some of the additional test failures are the following:
That's a start, and all I have time for today. :-) |
|
A few more:
|
|
@benhoyt Word boundary assertions ( The issue was that Release: https://github.com/coregx/coregex/releases/tag/v0.8.5 |
coregex v0.8.6 released 🎉Fixed Issue #14: What was broken: The fix: Added Release: https://github.com/coregx/coregex/releases/tag/v0.8.6 go get github.com/coregx/coregex@v0.8.6This should fix the CI failures like:
Looking forward to the test results! |
Update: coregex v0.8.7 releasedFixed in v0.8.7:
This fixes the Still investigating:
go get github.com/coregx/coregex@v0.8.7Release: https://github.com/coregx/coregex/releases/tag/v0.8.7 |
coregex v0.8.8 released 🎉Fixed: Issue #15 - DFA.IsMatch with capture groupsRoot cause: This fixes the go get github.com/coregx/coregex@v0.8.8Summary of fixes in v0.8.6-v0.8.8:
Release: https://github.com/coregx/coregex/releases/tag/v0.8.8 |
|
Hi @benhoyt! 👋 I've released v0.8.10 with inline flags support (fixes #8):
For AWK compatibility, since AWK expects
The gsub-related failures (gsubtest, gsubtst3, etc.) appear to be related to a separate issue with empty alternation branches (#16) - I'm investigating that next. Let me know if you'd like to test v0.8.10! |
|
Yes, I do already wrap all user patterns with I've updated the branch to v0.8.10. Getting very close now. Just 3 remaining tests that fail (I haven't yet looked into why): |
|
Thanks for testing v0.8.10! Regarding the // Both stdlib and coregex return the same result:
regexp.MustCompile(`(#|#!)`).ReplaceAllString("#!a", "") // "!a"
coregex.MustCompile(`(#|#!)`).ReplaceAllString("#!a", "") // "!a"
For AWK compatibility, you'd need to reorder alternations by length (longest first): This is beyond what coregex can fix since it's designed to match Go stdlib behavior exactly. I'll look into the other 2 failing tests (t.gsub1, t.monotone) to see if they're similar semantic differences or actual bugs. |
|
Update: Checked Rust regex behavior Rust regex also uses // regex-automata/src/util/search.rs
pub enum MatchKind {
All,
#[default]
LeftmostFirst, // ← default
// LeftmostLongest - NOT IMPLEMENTED
}From Rust regex source comments:
Summary of modern regex engines:
So this is standard behavior for PCRE-style engines. AWK's POSIX semantics are the exception. For GoAWK, possible solutions:
|
|
v0.8.11 released with additional bug fixes:
The Would be great to re-run tests with v0.8.11 to see if any regex-related failures are resolved! |
|
Just one failing test left: This is a leftmost-longest test. But the coregex docs indicate it should be leftmost-longest by default? https://pkg.go.dev/github.com/coregx/coregex#Regex.Longest |
|
Great news - only one failing test left! 🎉 It's been really productive working through these issues together. Considering the scope of coregex's functionality, the number of bugs we found was surprisingly small, which is encouraging! Regarding the You're right that our documentation was misleading - I've investigated this thoroughly. Here's what I found:
Neither Rust's regex crate nor Google's RE2 implement leftmost-longest semantics - they only support leftmost-first. However, for true stdlib compatibility, we should support it. I've created issue #26 to track this. We're now conducting deep research into:
We'll implement this in a separate branch ( For now, the |
|
@benhoyt v0.8.12 released with The |
|
Thanks. All the GoAWK tests pass now! There's one final issue in case you care -- coregex doesn't compile on 32-bit systems (Linux i386). Here's the error: By the way, do you run coregex against the full suite of tests from the standard library's regex package? That would be worth doing -- I think it would have caught several of the issues here. |
|
v0.8.13 released - fixed the 386 build issue: https://github.com/coregx/coregex/releases/tag/v0.8.13 Also added CI job for 386 to catch this in future. |
This is not for merging, but just for testing the new https://github.com/coregx/coregex library and seeing what tests fail.
Things I've found at a guick glance:
The following issues are fixed in coregex v0.8.2:
coregex has noRegex.Longest(), which GoAWK usescoregex has noQuoteMetafunction (so I'm just using theregexppackage's one, which seems fine for that)Probably the worst issue is that some regexes seem to hang indefinitely and peg the CPU to 100% when compiling. For example, at first GoAWK didn't start up, because I have an init-timeregexp.Compilewhich hangs when using coregex. The original was regex was in interp.go,^([_a-zA-Z][_a-zA-Z0-9]*)=(.*)-- it seems to be the last.*part that's causing it to hang.Other test failures should be able to be seen in the GitHub Actions output, for example.