-
-
Notifications
You must be signed in to change notification settings - Fork 61
Better plot tests #1541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better plot tests #1541
Conversation
Full Graphics* output of Plot command is compared against reference
|
So there was a test failure because of a difference in the last digit of a double-precision float. Couple of possible fixes:
I'll look into this tomorrow. |
|
Pyodide failed because (of course) it can't run diff. Couple options:
I'm a slightly surprised that the Windows test succeeded - wasn't sure about /tmp and diff. Is it running under Cygwin or something? |
What about apply eval_N with 4 digits of accuracy to the graphicsbox expression? The result should match in any architecture.
|
I guess these tests should be in their own test workflow, and probably make sense to check just in Ubuntu. If other architectures fail, would be an issue of the low lying library, isn't it? |
Something else that would be useful is a way to regenerate the reference files automatically when needed. Box expressions follow some internal rules and can change with the version release. |
for example, under Pyodide
|
So I've made the following improvements:
Yes, that's essentially what I'm doing, except in native floating point as the numbers are being printed. (Specifically,
Yes, but with the changes above the small inconsequential differences between the architectures are concealed and the tests pass, so it should be possible to just run in all environments in the current test workflows, if you want. (Note that just running our tests on Ubuntu in itself doesn't guarantee our tests will see the same results someone gets when they generate the reference files on their Mac...)
Indeed. See You mentioned boxing - to be clear, these unit tests verify the immediate Graphics* output of Plot*, which is not boxed. I'm not sure how testing boxing should be done. These specific tests could be enlarged to further generate GraphicsBox* output from the Graphics* output of Plot*, but is the unit tests for Plot* the right home for unit tests for Graphics* -> GraphicsBox*? |
Excellent
OK
OK, but that wouldn't be issues outside the graphics module? If MS Windows produce a plot differing in some decimal positions relative to Ubuntu, it does not say too much about how we build graphics, but about the numpy implementation, or Sympy, or any other low-lying library...
Great
OK, yes, my bad. I say boxing because I have been dealing with it for a while, but here we are working with a previous stage, which is the Graphics(3D) sub-language. What we are testing here is the translation / evaluation of Plot expressions into Graphics(3D) expressions. |
| @@ -0,0 +1,198 @@ | |||
| System`Graphics | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If brackets and commas were keep, these files could be evaluated, and would be in proper WL. In that case, .m extension would be more appropriated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's close, but some things like how the numpy arrays underlying NumericArray are presented by using numpy's built-in string conversion that limits the number of elements printed (and also shows limited precision) aren't WL. The primary intent is human consumption for debugging, but if proves to be too limiting that it's also not readily machine-readable may be we could revisit it and find a way to satisfy both requirements.
mmatera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments, LGTM. Merge when you consider it is ready.
I just meant that if your suggestion to only run on Ubuntu was a way to avoid architecture-specific behavior in numpy (and maybe that's not what you meant) that it would only work if the tests were generated on the same architecture (and version, etc.) In any case I think we're on the same page that with the tests now hiding inconsequential differences they should now be robust wrt exact architecture-specific library details. |
We only have a couple plot tests that actually look at the output, and those were constructed and run laboriously, and none of them test the new vectorized plot code. This makes refactoring the code risky (and so far I've been relying mostly on end-to-end testing that compares the generated visuals in my test front-end).
Here's a different approach: the idea is to write the result expression to a file in outline tree form, and then compare the actual result with an expected reference result using diff. This provides easily constructed tests with a fairly readable indication of what has changed. See comment in test_plot__detailed.py
In addition, this change