Projection kernels by AdrianSosic · Pull Request #689 · emdgroup/baybe

AdrianSosic · 2025-11-05T15:49:01Z

Adds the ProjectionKernel class and a corresponding ProjectionKernelFactory class for modeling in high-dimensional state spaces.

Key insights from testing:

When using the "ground truth" subspace, the method works perfectly
However, there is a significant drop in performance when the subspace needs to be learned
--> Apparently, this seems to be rather difficult learning problem, hence I've added several alternative "smart" initialization strategies.
Across the scenarios I've tested, PLS seemed like the overall most promising method
Rank correlation looks much better than predictive "accuracy" (which is still good enough for BO, I think)

TODOs:

Implement projection approach
Add example
Add/update hypothesis strategies

Copilot

Pull request overview

This PR introduces projection kernel capabilities for modeling in high-dimensional state spaces, enabling Gaussian processes to operate on learned or predefined lower-dimensional subspaces to improve sample efficiency and reduce overfitting.

Key Changes:

Adds ProjectionKernel class and ProjectionKernelFactory with multiple initialization strategies (MASKING, ORTHONORMAL, PLS, SPHERICAL)
Includes comprehensive example demonstrating projection kernel usage in high-dimensional scenarios
Fixes random seed handling in simulation scenarios to ensure proper randomization when using initial data

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
baybe/kernels/composite.py	Implements the new `ProjectionKernel` class with projection matrix handling
baybe/kernels/_gpytorch.py	Adds GPyTorch backend implementation for projection kernels
baybe/kernels/init.py	Exports the new `ProjectionKernel` class
baybe/surrogates/gaussian_process/kernel_factory.py	Implements `ProjectionKernelFactory` with multiple initialization strategies and matrix generation logic
baybe/serialization/core.py	Adds serialization support for numpy arrays
baybe/serialization/utils.py	Provides utility functions for numpy array serialization/deserialization
baybe/simulation/scenarios.py	Fixes random seed logic to increment properly when using initial data
tests/hypothesis_strategies/kernels.py	Adds hypothesis strategies for generating projection kernels in tests
examples/Custom_Surrogates/projection_kernel.py	Demonstrates projection kernel usage with comparative analysis
docs/userguide/transfer_learning.md	Comments out bibliography filter (likely unintentional)
docs/references.bib	Adds citation for high-dimensional Bayesian optimization paper
CHANGELOG.md	Documents the new features and fixes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-26T16:22:24Z

baybe/surrogates/gaussian_process/kernel_factory.py

+            self.n_matrices,
+            self.n_projections,


Arguments passed to _make_projection_matrices are in the wrong order. The function signature expects (n_projections, n_matrices, ...) but the call passes (self.n_matrices, self.n_projections, ...).

Suggested change

self.n_matrices,

self.n_projections,

self.n_projections,

self.n_matrices,

docs/userguide/transfer_learning.md

baybe/serialization/utils.py

AVHopp

First round of comments.

AVHopp · 2025-12-01T09:13:18Z

baybe/kernels/_gpytorch.py

+
+import torch
+from gpytorch.kernels import Kernel
+from torch import Tensor


No lazy import? I assume this is since this file itself will only be imported lazily, or has this been overlooked?

AVHopp · 2025-12-01T09:15:46Z

baybe/kernels/_gpytorch.py

+
+from baybe.utils.torch import DTypeFloatTorch
+
+_ConvertibleToTensor = Any


I'm confused by this - first, why is Anything convertible to a tensor? Second, why the type alias here instead of just having projection_matrix: Any down in the __init__?

baybe/kernels/_gpytorch.py

AVHopp · 2025-12-01T09:25:13Z

baybe/kernels/composite.py

+    base_kernel: Kernel = field(validator=instance_of(Kernel))
+    """The kernel applied to the projected inputs."""
+
+    projection_matrix: np.ndarray = field(


There seems to be no sort of validation of the exact dimensions of the matrix here other than that the matrix needs to have two dimensions, correct? Hence my question is whether or not this will be auto-derived in general and is thus not intended to be actually set by the user themselves or if this validation happens somewhere else.

As discussed, we might want to make the description here a bit clearer and tell the user that this will not be validated upon creation (as we can't) and/or throw an error at an appropriate point (see other comment in GPyTorch-kernel)

AVHopp · 2025-12-01T09:28:38Z

baybe/serialization/core.py

    return base64.b64encode(pickled_df).decode("utf-8")


+_unstructure_ndarray_hook = _unstructure_dataframe_hook


Doesn't this now mean that we use a function that explicitly uses pd.DataFrame in the type hint also for np.ndarray? That does not feel good. Either adjust the type hint of the unstructure hook or create an additional one, even if the code is identical

AVHopp · 2025-12-01T12:44:00Z