[ENH] experimental design for model indexing framework #1583

fkiraly · 2025-12-31T01:21:52Z

This PR is an experimental draft showcasing:

a light-weight packaging layer for objects of any kind, using python code serialization rather than "translation"
a new taxon "model" which corresponds to (uninstantiated) classes
an example implementation for tabular classifiers, which adds records for two examples: XGBClassifier from xgboost, and AutoSklearnClassifier from auto-sklearn

Note: this is currently not using any backend - the idea is that the light-weight packaging layer can be easily serialized by the backend.

Usage pattern for end users

End users would interact with the layer as follows:

import openml

clf1 = openml.get("XGBClassifier")
clf2 = openml.get("AutoSklearnClassifier")

Either line will:

if the required soft dependencies (e.g., xgboost, auto-sklearn) are present, directly import the class from the package.
if the required soft dependencies are not present, raise an informative error message

The strings live in a unique namespace for objects - usually, for widely known packages (but not necessarily), they will correspond to class names

This could be further extended to:

a deps utility that, for a namespace ID provides required dependencies - the tags already are inspectable here
tracking of dependencies on a more granular level for reproducibility, e.g., the exact version with which a benchmark in which a model figures was carried out, via pip freeze.

Versioning inside openml is consciously avoided in this design, as versioning already happens in the dependencies which are tracked by the tag system.

The logic internally works through:

a registry lookup mechanism
a unified, but openml-private materialize interface that obtains the class from the package layer

Usage patterns for estimator contributors

Currently, an extender would interact with this by manually adding a class in openml.models.classifiers, in the simplest case they are thin wrappers pointing to a third party location:

from openml.models.apis import _ModelPkgClassifier

class OpenmlPkg__AutoSklearnClassifier(_ModelPkgClassifier):
    _tags = {
        "pkg_id": "AutoSklearnClassifier",
        "python_dependencies": "auto-sklearn",
    }

    _obj = "autosklearn.classification.AutoSklearnClassifier"

But this class could also be reliant on a full python definition of a class itself, with _obj pointing to a python file, or the method _materialize implementing explicit python code.

This can also be automated further in the context of a database backend:

creating a dynamic packaging class from an in-memory object
call of this from a publish method that automatically extracts tags etc, similar to OpenMLFlow.publish
or, a publish method applying to an entire package, crawling it, and extracting pointer records to all sklearn estimators

Intended backend interaction pattern

A database backend would be inserted as follows.

for publish / populate

publish creates a dynamic package class inheriting from the new _BasePkg
calls serialize on it to obtain a string
calls get_tags to obtain a dict of packaging metadata
optionally, adds data about publish process metadata (author, timestamp, etc)
publishes both to the database - DB API call

for query

DB API call for lookup mechanism
retrieve previously published record - also DB API call
invert serialize by zlib decompress and exec to obtain a dynamic python package class
materialize from that class

for more information, see https://pre-commit.ci

experimental model framework

20e37a2

fkiraly added the enhancement label Dec 31, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

d79fbe5

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH] experimental design for model indexing framework #1583

[ENH] experimental design for model indexing framework #1583

Uh oh!

fkiraly commented Dec 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[ENH] experimental design for model indexing framework #1583

Are you sure you want to change the base?

[ENH] experimental design for model indexing framework #1583

Uh oh!

Conversation

fkiraly commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage pattern for end users

Usage patterns for estimator contributors

Intended backend interaction pattern

for publish / populate

for query

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fkiraly commented Dec 31, 2025 •

edited

Loading