[ENH] experimental design for model indexing framework #1583
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is an experimental draft showcasing:
XGBClassifierfromxgboost, andAutoSklearnClassifierfromauto-sklearnNote: this is currently not using any backend - the idea is that the light-weight packaging layer can be easily serialized by the backend.
Usage pattern for end users
End users would interact with the layer as follows:
Either line will:
xgboost,auto-sklearn) are present, directly import the class from the package.The strings live in a unique namespace for objects - usually, for widely known packages (but not necessarily), they will correspond to class names
This could be further extended to:
depsutility that, for a namespace ID provides required dependencies - the tags already are inspectable herepip freeze.Versioning inside
openmlis consciously avoided in this design, as versioning already happens in the dependencies which are tracked by the tag system.The logic internally works through:
materializeinterface that obtains the class from the package layerUsage patterns for estimator contributors
Currently, an extender would interact with this by manually adding a class in
openml.models.classifiers, in the simplest case they are thin wrappers pointing to a third party location:But this class could also be reliant on a full python definition of a class itself, with
_objpointing to a python file, or the method_materializeimplementing explicit python code.This can also be automated further in the context of a database backend:
publishmethod that automatically extracts tags etc, similar toOpenMLFlow.publishpublishmethod applying to an entire package, crawling it, and extracting pointer records to allsklearnestimatorsIntended backend interaction pattern
A database backend would be inserted as follows.
for publish / populate
publishcreates a dynamic package class inheriting from the new_BasePkgserializeon it to obtain a stringget_tagsto obtain adictof packaging metadatafor query
serializebyzlibdecompress andexecto obtain a dynamic python package classmaterializefrom that class