Better plot tests #1541

bdlucas1 · 2025-12-09T20:29:22Z

We only have a couple plot tests that actually look at the output, and those were constructed and run laboriously, and none of them test the new vectorized plot code. This makes refactoring the code risky (and so far I've been relying mostly on end-to-end testing that compares the generated visuals in my test front-end).

Here's a different approach: the idea is to write the result expression to a file in outline tree form, and then compare the actual result with an expected reference result using diff. This provides easily constructed tests with a fairly readable indication of what has changed. See comment in test_plot__detailed.py

In addition, this change

Makes it possible to change at runtime whether vectorized or classic plot code is used
Augments print_expression_tree to print the abbreviated numpy arrays (default numpy str implementation) so that numeric data can be checked efficiently for gross changes.

Full Graphics* output of Plot command is compared against reference

bdlucas1 · 2025-12-09T20:42:05Z

So there was a test failure because of a difference in the last digit of a double-precision float. Couple of possible fixes:

Have the print_expression_tree function special-case Machine`Real and round it to single precision before printing
Avoid trig functions where machines might differ and just use simple functions like x+y or x*y that are unlikely to show any difference on different architectures.
Probably both of the above.

I'll look into this tomorrow.

bdlucas1 · 2025-12-09T20:46:21Z

Pyodide failed because (of course) it can't run diff. Couple options:

don't run the tests under Pyodide
There's a Python diff package. Don't know what capability it has. It would create a dependency on another package at test time. I can look into it if that would be desirable.

I'm a slightly surprised that the Windows test succeeded - wasn't sure about /tmp and diff. Is it running under Cygwin or something?

mmatera · 2025-12-09T21:55:34Z

So there was a test failure because of a difference in the last digit of a double-precision float. Couple of possible fixes:

What about apply eval_N with 4 digits of accuracy to the graphicsbox expression? The result should match in any architecture.

Have the print_expression_tree function special-case Machine`Real and round it to single precision before printing

Avoid trig functions where machines might differ and just use simple functions like x+y or x*y that are unlikely to show any difference on different architectures.

Probably both of the above.

I'll look into this tomorrow.

mmatera · 2025-12-09T21:57:44Z

Pyodide failed because (of course) it can't run diff. Couple options:

don't run the tests under Pyodide

There's a Python diff package. Don't know what capability it has. It would create a dependency on another package at test time. I can look into it if that would be desirable.

I'm a slightly surprised that the Windows test succeeded - wasn't sure about /tmp and diff. Is it running under Cygwin or something?

I guess these tests should be in their own test workflow, and probably make sense to check just in Ubuntu. If other architectures fail, would be an issue of the low lying library, isn't it?

mmatera · 2025-12-09T22:01:10Z

Pyodide failed because (of course) it can't run diff. Couple options:

don't run the tests under Pyodide

There's a Python diff package. Don't know what capability it has. It would create a dependency on another package at test time. I can look into it if that would be desirable.

I'm a slightly surprised that the Windows test succeeded - wasn't sure about /tmp and diff. Is it running under Cygwin or something?

Something else that would be useful is a way to regenerate the reference files automatically when needed. Box expressions follow some internal rules and can change with the version release.

for example, under Pyodide

bdlucas1 · 2025-12-10T05:09:08Z

So I've made the following improvements:

When printing the result approximate the numbers, and conceal the difference in the printed result between numpy Integer32/64, Real32/64, etc. This eliminates all differences in printed numeric output between our test environments.
Error out early and print if there are any messages, which made it possible to diagnose the following failure on pyodide:
Don't try to run CountourPlot tests if scikit-image isn't installed. (Do we want to install it there for our tests?)

What about apply eval_N with 4 digits of accuracy

Yes, that's essentially what I'm doing, except in native floating point as the numbers are being printed. (Specifically, str(round(x * 1e6) / 1e6) to limit absolute precision to 6 digits).

I guess these tests should be in their own test workflow, and probably make sense to check just in Ubuntu. If other architectures fail, would be an issue of the low lying library, isn't it?

Yes, but with the changes above the small inconsequential differences between the architectures are concealed and the tests pass, so it should be possible to just run in all environments in the current test workflows, if you want. (Note that just running our tests on Ubuntu in itself doesn't guarantee our tests will see the same results someone gets when they generate the reference files on their Mac...)

a way to regenerate the reference files automatically

Indeed. See make_ref_files near the end of test_plot_detail.py. All that's needed is to run the tests, pointing their output at the ref file directory instead of /tmp.

You mentioned boxing - to be clear, these unit tests verify the immediate Graphics* output of Plot*, which is not boxed. I'm not sure how testing boxing should be done. These specific tests could be enlarged to further generate GraphicsBox* output from the Graphics* output of Plot*, but is the unit tests for Plot* the right home for unit tests for Graphics* -> GraphicsBox*?

mmatera · 2025-12-11T17:39:19Z

So I've made the following improvements:

* When printing the result approximate the numbers, and conceal the difference in the printed result between numpy Integer32/64, Real32/64, etc. This eliminates all differences in printed numeric output between our test environments.

Excellent

* Error out early and print if there are any messages, which made it possible to diagnose the following failure on pyodide:

* Don't try to run CountourPlot tests if scikit-image isn't installed. (Do we want to install it there for our tests?)
What about apply eval_N with 4 digits of accuracy

Yes, that's essentially what I'm doing, except in native floating point as the numbers are being printed. (Specifically, str(round(x * 1e6) / 1e6) to limit absolute precision to 6 digits).

OK

I guess these tests should be in their own test workflow, and probably make sense to check just in Ubuntu. If other architectures fail, would be an issue of the low lying library, isn't it?

Yes, but with the changes above the small inconsequential differences between the architectures are concealed and the tests pass, so it should be possible to just run in all environments in the current test workflows, if you want. (Note that just running our tests on Ubuntu in itself doesn't guarantee our tests will see the same results someone gets when they generate the reference files on their Mac...)

OK, but that wouldn't be issues outside the graphics module? If MS Windows produce a plot differing in some decimal positions relative to Ubuntu, it does not say too much about how we build graphics, but about the numpy implementation, or Sympy, or any other low-lying library...

a way to regenerate the reference files automatically

Indeed. See make_ref_files near the end of test_plot_detail.py. All that's needed is to run the tests, pointing their output at the ref file directory instead of /tmp.

Great

You mentioned boxing - to be clear, these unit tests verify the immediate Graphics* output of Plot*, which is not boxed. I'm not sure how testing boxing should be done. These specific tests could be enlarged to further generate GraphicsBox* output from the Graphics* output of Plot*, but is the unit tests for Plot* the right home for unit tests for Graphics* -> GraphicsBox*?

OK, yes, my bad. I say boxing because I have been dealing with it for a while, but here we are working with a previous stage, which is the Graphics(3D) sub-language. What we are testing here is the translation / evaluation of Plot expressions into Graphics(3D) expressions.

mmatera · 2025-12-11T17:41:02Z

test/builtin/drawing/test_plot_detail_ref/plot-cls-opt.txt

@@ -0,0 +1,198 @@
+System`Graphics


If brackets and commas were keep, these files could be evaluated, and would be in proper WL. In that case, .m extension would be more appropriated.

It's close, but some things like how the numpy arrays underlying NumericArray are presented by using numpy's built-in string conversion that limits the number of elements printed (and also shows limited precision) aren't WL. The primary intent is human consumption for debugging, but if proves to be too limiting that it's also not readily machine-readable may be we could revisit it and find a way to satisfy both requirements.

mmatera

Apart from the comments, LGTM. Merge when you consider it is ready.

bdlucas1 · 2025-12-11T19:21:11Z

I guess these tests should be in their own test workflow, and probably make sense to check just in Ubuntu. If other architectures fail, would be an issue of the low lying library, isn't it?

Yes, but with the changes above the small inconsequential differences between the architectures are concealed and the tests pass, so it should be possible to just run in all environments in the current test workflows, if you want. (Note that just running our tests on Ubuntu in itself doesn't guarantee our tests will see the same results someone gets when they generate the reference files on their Mac...)

OK, but that wouldn't be issues outside the graphics module? If MS Windows produce a plot differing in some decimal positions relative to Ubuntu, it does not say too much about how we build graphics, but about the numpy implementation, or Sympy, or any other low-lying library...

I just meant that if your suggestion to only run on Ubuntu was a way to avoid architecture-specific behavior in numpy (and maybe that's not what you meant) that it would only work if the tests were generated on the same architecture (and version, etc.) In any case I think we're on the same page that with the tests now hiding inconsequential differences they should now be robust wrt exact architecture-specific library details.

bdlucas1 added 11 commits December 9, 2025 09:45

Enable switching between vectorized and classic plots at runtime

b9782d3

Print abbreviated NumericArray

2ec07f3

Add detailed plot testing

ccba5f6

Full Graphics* output of Plot command is compared against reference

Formatting

a3b71f3

Add comment

1ddb4dc

Remove actual output file from /tmp after a successful test

b9a20e4

Rename tests

42bee11

Rename tests

eeb3a84

Test with and without options

bc78b18

Add comment

a045750

Formatting

1d96285

bdlucas1 added 12 commits December 9, 2025 17:16

Use .8g format for reals to avoid spurious diffs

f3442b7

Use simple string comparison as a fallback if diff isn't available

94af988

for example, under Pyodide

Formatting

f5321a5

Avoid trig functions

0e95fff

Formatting

e58fd30

Round down to 0 for very small values when printing tree

83def9c

Round down to 0 when printing very small values

bf2f361

Fewer digits

5a7a51d

More approximate numerics

dcb9225

Formatting

7e74186

Better error handling for messages

7eeb838

Don't try to run ContourPlot tests if skimage is not available

5e62b50

mmatera reviewed Dec 11, 2025

View reviewed changes

mmatera approved these changes Dec 11, 2025

View reviewed changes

bdlucas1 merged commit a7acec0 into Mathics3:master Dec 11, 2025
17 checks passed

Uh oh!

Better plot tests #1541

Better plot tests #1541

Uh oh!

Conversation

bdlucas1 commented Dec 9, 2025

Uh oh!

bdlucas1 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdlucas1 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmatera commented Dec 9, 2025

Uh oh!

mmatera commented Dec 9, 2025

Uh oh!

mmatera commented Dec 9, 2025

Uh oh!

bdlucas1 commented Dec 10, 2025

Uh oh!

mmatera commented Dec 11, 2025

Uh oh!

mmatera Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bdlucas1 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

mmatera left a comment

Choose a reason for hiding this comment

Uh oh!

bdlucas1 commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bdlucas1 commented Dec 9, 2025 •

edited

Loading

bdlucas1 commented Dec 9, 2025 •

edited

Loading