-
Notifications
You must be signed in to change notification settings - Fork 22
robustdiff to address #97
#161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tebooks 1 and 2a (2b yet to do), added it to optimization and played with it a long time, still toying, improved the way imports are done in various __init__.py with the else clause, which I didn't know about before
pynumdiff/optimize/_optimize.py
Outdated
| {'q': (1e-10, 1e10), | ||
| 'r': (1e-10, 1e10)}) | ||
| 'r': (1e-10, 1e10)}), | ||
| robustdiff: ({'order': {1, 2, 3}, #categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might also need huberM to be a hyperparameter. Maybe discrete, because we want to hit 0 on the dot.
pynumdiff/tests/test_diff_methods.py
Outdated
| (lineardiff, {'order':3, 'gamma':5, 'window_size':11, 'solver':'CLARABEL'}), (lineardiff, [3, 5, 11], {'solver':'CLARABEL'}), | ||
| (rbfdiff, {'sigma':0.5, 'lmbd':0.001}) | ||
| ] | ||
| diff_methods_and_params = [(robustdiff, {'order':3, 'qr_ratio':1e6})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotta remove this next commit. Should test all.
…the CI version is arriving at a slightly different answer
| from warnings import filterwarnings, warn | ||
| from multiprocessing import Pool | ||
| from multiprocessing import Pool, Manager | ||
| from hashlib import sha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently calling hash() on identical objects in different processes isn't guaranteed to return the same value, so I had to get fancier.
|
|
||
| # Map from method -> (search_space, bounds_low_hi) | ||
| method_params_and_bounds = { | ||
| spectraldiff: ({'even_extension': {True, False}, # give categorical params in a set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the order of a bunch of listings around this PR, to better match the taxonomy paper and readme.
| """ | ||
| point_params = {k:(v if search_space_types[k] == float else | ||
| int(np.round(v))) for k,v in zip(search_space_types, point)} # point -> dict | ||
| key = sha1((''.join(f"{v:.3e}" for v in point) + # This hash is stable across processes. Takes bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashing unhashable types reliably was more pain than I expected. It's still not totally perfect, because scientific notation isn't a totally reliable way to see we've already queried somewhere very nearby. For instance, 0.00e+0 looks very different from 1.00e-5 as a string, but 1.10e3 vs identical-looking 1.10e3 could be hiding a difference much greater than 1e-5.
| int(np.round(v))) for k,v in zip(search_space_types, point)} # point -> dict | ||
| key = sha1((''.join(f"{v:.3e}" for v in point) + # This hash is stable across processes. Takes bytes | ||
| ''.join(str(v) for k,v in sorted(categorical_params.items()))).encode()).digest() | ||
| if key in cache: return cache[key] # short circuit if this hyperparam combo has already been queried, ~10% savings per #160 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I actually need to store and return the value. Could maybe just shortcircuit by returning NaN, since there's a nanargmin at the end. But storing and returning a number is cheap.
| params, bounds = method_params_and_bounds[func] | ||
| params.update(search_space_updates) # for things not given, use defaults | ||
| search_space, bounds = method_params_and_bounds[func] | ||
| search_space.update(search_space_updates) # for things not given, use defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this. It's not exactly the search space. It's a bunch of directions about how to make starting conditions. But it's close in spirit.
| _minimize = partial(scipy.optimize.minimize, _obj_fun, method=opt_method, bounds=bounds, options={'maxiter':maxiter}) | ||
| results += pool.map(_minimize, starting_points) # returns a bunch of OptimizeResult objects | ||
| with Manager() as manager: | ||
| cache = manager.dict() # cache answers to avoid expensive repeat queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a great object somebody made. Super easy to use.
added
robustdifftokalman_smooth, added tests for it, added it to notebooks 1 and 2a (2b yet to do), added it to optimization and played with it a long time, still toying, improved the way imports are done in various init.py with the else clause, which I didn't know about beforeStill need to:
robustdiffto, and rerun 2b notebookrobustdifftosuggest_methodand rerun notebook 3robustdiffin notebook 4 to see whether this thing lives up to the name I've given it