Addressing #167 #168

pavelkomarov · 2025-11-07T23:03:03Z

improved evaluation code to have robust root mean error metric and simpler rmse function. Amended how constants of integration are found to be robust. Optimizer loss function now uses x_hat directly instead of trying to integrate the derivative, since all algorithms return an x_hat estimate associated with dxdt_hat anyway. SpectralDiff now does an IFFT to get x_hat instead of trapezoidal integration. Improved a few docstrings. Got tests to pass with robust change. Nixed a few tests of the optimizer, because they're redundant with each other and the notebooks

…mpler rmse function. Amended how constants of integration are found to be robust. Optimizer loss function now uses x_hat directly instead of trying to integrate the derivative, since all algorithms return an x_hat estimate associated with dxdt_hat anyway. SpectralDiff now does an IFFT to get x_hat instead of trapezoidal integration. Improved a few docstrings. Got tests to pass with robust change. Nixed a few tests of the optimizer, because they're redundant with each other and the notebooks

pavelkomarov · 2025-11-07T23:03:39Z

pynumdiff/basis_fit/_basis_fit.py

    :param bool pad_to_zero_dxdt: if True, extend the data with extra regions that smoothly force the derivative to
            zero before taking FFT.

-    :return: tuple[np.array, np.array] of\n


I decided I like a slightly shorter form for these.

pavelkomarov · 2025-11-07T23:04:17Z

pynumdiff/basis_fit/_basis_fit.py

+    filt = np.ones(k.shape); filt[discrete_cutoff:N-discrete_cutoff] = 0
+
+    # Smoothed signal
+    X = np.fft.fft(x)


I think this code is a bit clearer and more like what people canonically do now. Integrating to get x_hat was always strange in this spectral case.

pavelkomarov · 2025-11-07T23:06:35Z

pynumdiff/finite_difference/_finite_difference.py


    if num_iterations > 1: # We've lost a constant of integration in the above
-        x_hat += utility.estimate_integration_constant(x, x_hat) # uses least squares
+        x_hat += utility.estimate_integration_constant(x, x_hat)


This now emphatically does not use least squares, because that's actually a terrible way to try to estimate the constant of integration if there are outliers. It now uses an outlier-robust fancy-shmansy thing under the hood, which justifies having a whole separate function to this a little more too.

pavelkomarov · 2025-11-07T23:07:37Z

pynumdiff/kalman_smooth/_kalman_smooth.py

    :param float or array[float] _t: This function supports variable step size. This parameter is either the constant
        step size if given as a single float, or sample locations if given as an array of same length as the state histories.
    :param np.array A: state transition matrix, in discrete time if constant dt, in continuous time if variable dt
-    :param list[np.array] xhat_pre: a priori estimates of xhat from a kalman_filter forward pass


I turned these into 3D arrays rather than lists of 2D arrays a while ago and hadn't updated the docstring.

pavelkomarov · 2025-11-07T23:09:22Z

pynumdiff/kalman_smooth/_kalman_smooth.py

-    objective = 0.5*cvxpy.sum_squares(proc_resids) if huberM == float('inf') \
-                else np.sqrt(2)*cvxpy.sum(cvxpy.abs(proc_resids)) if huberM < 1e-3 \
-                else huber_const(huberM)*cvxpy.sum(cvxpy.huber(proc_resids, huberM))  # 1/2 l2^2, l1, or Huber
+    objective = 0.5*cvxpy.sum_squares(proc_resids) if proc_huberM == float('inf') \


I have to try out whether splitting up proc_huberM and meas_huberM etc gives better results, but I'm reasonably sure we need Nelder-Mead to do a 4D search here, which means it makes a lot of queries. Thankfully, each query is not fast, thanks to vectorized CVXPY code and CLARABEL taking full advantage of sparse matrices.

pavelkomarov · 2025-11-07T23:10:58Z

pynumdiff/optimize/_optimize.py

                        {'q': (1e-10, 1e10),
                         'r': (1e-10, 1e10)}),
+    # robustdiff: ({'order': {1, 2, 3}, # warning: order 1 hacks the loss function when tvgamma is used, tends to win but is usually suboptimal choice in terms of true RMSE
+    #               'log_q': [1., 4, 8, 12], # decimal after first entry ensure this is treated as float type


I'm in the midst of experimenting here. Got derailed by #167

pavelkomarov · 2025-11-07T23:11:48Z

pynumdiff/optimize/_optimize.py

-        rms_rec_x, rms_x, rms_dxdt = evaluate.rmse(x, dt, x_hat, dxdt_hat, dxdt_truth=None, padding=padding)
-        cost = rms_rec_x + tvgamma*evaluate.total_variation(dxdt_hat, padding=padding)
+    else: # then minimize sqrt{2*Mean(Huber((x_hat- x)/sigma))}*sigma + gamma*TV(dxdt_hat)
+        cost = evaluate.robust_rme(x, x_hat, padding=padding) + tvgamma*evaluate.total_variation(dxdt_hat, padding=padding)


This case is now a bit simplified in code here, although it's doing a fancier thing. The evaluation code improvements paying dividends.

pavelkomarov · 2025-11-07T23:13:56Z

pynumdiff/tests/test_diff_methods.py

                   [(-25, -25), (0, -1), (0, 0), (1, 1)],
                   [(-25, -25), (1, 1), (0, 0), (1, 1)],
                   [(-25, -25), (3, 3), (0, 0), (3, 3)]],
-    iterated_second_order: [[(-9, -10), (-25, -25), (0, -1), (0, 0)],


Using the finding the integration constant robustly rather than with true least squares hurts performance on these fragile examples a tiny bit. But it's worth it for not living dangerously in a land where the constant can be easily corrupted in a much more significant way by a single outlier.

pavelkomarov · 2025-11-07T23:14:49Z

pynumdiff/tests/test_optimize.py

    assert params1['num_iterations'] == 5
    assert params2['num_iterations'] == 1

-def test_mediandiff():


It's silly to run the optimizer this many times on this many different functions. Nixed a few.

pavelkomarov · 2025-11-07T23:15:56Z

pynumdiff/tests/test_utils.py

+    x0 = utility.estimate_integration_constant(x, x_hat, M=float('inf'))
+    assert 0.95 < x0 < 1.05 # The result should be close to 1.0, but not exactly due to noise
+
+    x[100] = 100 # outlier case


Added and played with an outlier case to get an intuition that my function was working as expected.

pavelkomarov · 2025-11-07T23:16:25Z

pynumdiff/tests/test_utils.py

-                                [5.582, -0.31529832],
-                                [7.135, -0.58327787],
-                                [8.603, -1.71278265]])
+    assert np.allclose(maxtab, [[0.447, 1.58575613], # these numbers validated by eye with --plot


Due to run order change, the random seed behaves a tiny bit differently here.

pavelkomarov · 2025-11-07T23:18:38Z

pynumdiff/tests/test_utils.py

+    better in the presence of outliers"""
+    u = np.sin(np.arange(100)*0.1)
+    v = u + np.random.randn(100)
+    assert np.allclose(evaluate.rmse(u, v), evaluate.robust_rme(u, v, M=6))


Finally added an evaluation code test. It's mildly tricky to see how the robust_rme can equal rmse, but with big enough M, they now produce the same answer. And in the presence of outliers, the robust version is thrown off far less.

pavelkomarov · 2025-11-07T23:19:26Z

pynumdiff/utils/evaluate.py

    if show_error:
-        _, _, rms_dxdt = rmse(x, dt, x_hat, dxdt_hat, x_truth, dxdt_truth)
-        R_sqr = error_correlation(dxdt_hat, dxdt_truth)
+        rms_dxdt = rmse(dxdt_truth, dxdt_hat)


rmse now returns only one thing

pavelkomarov · 2025-11-07T23:19:57Z

pynumdiff/utils/evaluate.py

-        _, _, rms_dxdt = rmse(x, dt, x_hat, dxdt_hat, x_truth, dxdt_truth)
-        R_sqr = error_correlation(dxdt_hat, dxdt_truth)
+        rms_dxdt = rmse(dxdt_truth, dxdt_hat)
+        R_sqr = error_correlation(dxdt_truth, dxdt_hat)


Changed the order of inputs to all functions to have known stuff first and hatted, computed stuff second.

pavelkomarov · 2025-11-07T23:23:13Z

pynumdiff/utils/evaluate.py

-def rmse(x, dt, x_hat, dxdt_hat, x_truth=None, dxdt_truth=None, padding=0):
-    """Evaluate x_hat based on RMSE, calculating different ones depending on whether :code:`dxdt_truth`
-    and :code:`x_truth` are known.
+def robust_rme(x, x_hat, padding=0, M=6):


I decided to make this a separate function, although it reduces to be the same as rmse when M is infinity, because the two are used for different purposes in our optimization and plotting.

pavelkomarov · 2025-11-07T23:24:24Z

pynumdiff/utils/evaluate.py

-    x0 = utility.estimate_integration_constant(x, rec_x)
-    rec_x = rec_x + x0
-    rms_rec_x = np.linalg.norm(rec_x[s] - x[s]) / root
+def rmse(dxdt_truth, dxdt_hat, padding=0):


Many fewer parameters

pavelkomarov · 2025-11-07T23:25:05Z

pynumdiff/utils/evaluate.py

+    if padding == 'auto': padding = max(1, int(0.025*len(dxdt_hat)))
    s = slice(padding, len(dxdt_hat)-padding) # slice out data we want to measure
-    errors = (dxdt_hat[s] - dxdt_truth[s])
-    r = stats.linregress(dxdt_truth[s] - np.mean(dxdt_truth[s]), errors)


Subtracting off the mean here shouldn't be necessary, doesn't change correlation coefficient.

pavelkomarov · 2025-11-07T23:27:40Z

pynumdiff/utils/utility.py


    :return: **integration constant** (float) -- initial condition that best aligns x_hat with x
    """
-    return minimize(lambda x0, x, xhat: np.linalg.norm(x - (x_hat+x0)), # fn to minimize in 1st argument


We were formerly solving the least squares problem here, but the solution to the L2 problem is just the mean of x - x_hat, so now I'm just doing that in the M=infinity case. However, having the minimization problem set up was helpful for seeing how to stick the Huber in there.

pavelkomarov commented Nov 7, 2025

View reviewed changes

ajdusted error bound to get tests to pass

4be8161

pavelkomarov merged commit ab4c42f into master Nov 7, 2025
2 checks passed

pavelkomarov deleted the robust-optimization branch November 7, 2025 23:36

Addressing #167 #168

Addressing #167 #168

Uh oh!

Conversation

pavelkomarov commented Nov 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavelkomarov Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pavelkomarov Nov 7, 2025 •

edited

Loading

pavelkomarov Nov 7, 2025 •

edited

Loading

pavelkomarov Nov 7, 2025 •

edited

Loading