Emanuel Huber 2025-11-02
R functions for Gaussian process (GP) modelling. The core functions are coded in C++ and based on the EIGEN library (through RcppEigen)
Currently implemented/to do:
- Posterior Gaussian Process with Gaussian likelihood (Gaussian process conditioned to noise-free and noisy observations)
- Space-time Gaussian process
- Gaussian Process with monomial mean functions with vague Gaussian prior on the coefficient parameters.
- Gaussian Process conditioned to derivative observations
- Anisotropic covariance functions (scale and rotation)
- Log marginal likelihood of the Gaussian process
- Cross-matrix distance (distance between every rows of each
matrix):
crossdist(x,y,M)(withMa positive semidefinite matrix for anisotropic distances) - Covariance function: Matern, Gaussian, linear
- maximum likelihood hyper-parameter estimation
- McMC hyper-parameter sampling
- spatially varying covariance function
- Gaussian Process approximations (to deal with larger data set)
- add other covariance models
This is an ongoing project. If you have any questions, don’t hesitate to contact me:
Thank you!
if(!require("devtools")) install.packages("devtools")
devtools::install_github("emanuelhuber/GauProMod")library(GauProMod)
library(plot3D)
library(RColorBrewer)The observations are defined by a list with x the positions of the
observations and y the observed values. The targets are the positions
x where to simulate the Gaussian Process.
#observations
obs <- list(x=c(-4, -3, -1, 0, 4),
y=c(-2, 0, 1, 2, 0))
# targets
targ <- list("x"=seq(-10,10,len=200))To build the covariance functions, the following kernels are available
# linear kernel
covModel <- list(kernel="linear",
b = 1, # slope
h = 1.5, # std. deviation
c = 0) # constant
# Matern kernel
covModel <- list(kernel="matern",
l = 1, # correlation length
v = 2.5, # smoothness
h = 2.45) # std. deviation
# squared exponential kernel (Gaussian)
covModel <- list(kernel="gaussian",
l = 0.5, # correlation length
h = 0.25) # std. deviationcovModel <- list(kernel="matern",
l = 5, # correlation length
v = 1, # smoothness
h = 2.45 # std. deviation
)
r <- seq(0, 20, by = 0.1)
myCov <- covfx(r = r, covModel = covModel)
plot(r, myCov, type = "l", ylim = c(0, max(myCov)),
ylab = "covariance", xlab = "distance", xaxs = "i", yaxs = "i")The following mean function (or basis functions) are available (see Rasmussen and Williams (2006), Section 2.7):
# quadratic mean function
op <- 2
# linear mean function
op <- 1
# zero-mean function (no trend)
op <- 0 Because nothing is perfectly observed, it makes sense to account for
uncertainty in the observation. Gaussian likelihood, defined by the
standard deviation sigma (erreur uncertainty) is the only form of
likelihood currently implemented in GauProMod. Sigma must be either a
length-one vector or has exactly the same length as the observations
values
# standard deviation measurement error
# Gaussian likelihood
sigma <- 0.2
# or
sigma <- abs(rnorm(length(obs$y)))GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = op)
names(GP)
# GP$mean = mean value at location xstar
# GP$cov = covariance matrix of the conditioned GP
# GP$logLik = log-likelihood of the conditioned GP
# GP$xstar = x-coordinates at which the GP is simulatedPlot the mean function plus/minus the standard deviation
#--- plot mean +/- sd
xp <-(GP$mean + sqrt(diag(GP$cov))) # mean + sd
xm <-(GP$mean - sqrt(diag(GP$cov))) # mean - sd
# initialise the plot
plot(cbind(obs$x, obs$y), type="p", xlab="x", ylab="y",
xlim = range(c(obs$x, targ$x)), ylim = range(c(xp, xm, obs$y)),
pch = 20, col = "black")
lines(GP$xstar, GP$mean,col="red") # mean
lines(GP$xstar, xm,lty=3) # + sd
lines(GP$xstar, xp,lty=3) # - sd
legend("topleft", legend = c("obs", "mean", "sd"), lty = c(NA, 1, 3),
pch = c(20, NA, NA), col=c("black", "red", "black"), bty="n")Random conditional simulation
# cholesky factorisation
L <- cholfac(GP$cov)
# random simulation
ystar <- gpSim(GP , L = L)You can also directly use ystar <- gpSim(GP) without the argument L
(the Cholesky factor) but each time you will call gpSim(GP), gpSim
will compute again internally the Cholesky factor. So, if you plan to
run many unconditional simulations, it is faster to first compute the
Cholesky factor and then run several time gpSim with the argument L.
Plot the random simulation:
plot(rbind(cbind(obs$x, obs$y), ystar), type="n", xlab="x", ylab="y")
lines(ystar, col = "blue")
points(cbind(obs$x, obs$y), col = "black", pch = 20)
legend("topleft", legend = c("obs", "GP sim"), lty = c(NA, 1),
pch = c(20, NA), col=c("black", "blue"), bty="n")We define a new object bc (a list) defining the derivatives, with
elements:
xthe location of the derivativeythe value of the derivativesigmathe uncertainty (standard deviation) on the derivative value (y)
covModel <- list(kernel = "matern",
l = 0.25,
v = 3.5,
h = 0.55)
bc <- list(x = c(-4.5, -2, 0, 3, 4.5),
y = c( 0, 1, 0, -1, 0),
sigma = 0)
sigma <- 0.1
GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = 0)
GP2 <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = 0, bc = bc)
#--- plot mean +/- sd
xp <-(GP$mean + sqrt(diag(GP$cov))) # mean + sd
xm <-(GP$mean - sqrt(diag(GP$cov))) # mean - sd
xp2 <-(GP2$mean + sqrt(diag(GP2$cov))) # mean + sd
xm2 <-(GP2$mean - sqrt(diag(GP2$cov))) # mean - sd
plot(cbind(obs$x, obs$y), type="p", xlab="x", ylab="y",
xlim = range(c(obs$x, targ$x)), ylim = range(c(xp, xm, obs$y)),
pch = 20, col = "black", main = "without derivatives")
lines(GP$xstar, GP$mean,col="red") # mean
lines(GP$xstar, xm,lty=3) # + sd
lines(GP$xstar, xp,lty=3) # - sd
legend("topleft", legend = c("obs", "mean", "sd"), lty = c(NA, 1, 3),
pch = c(20, NA, NA), col=c("black", "red", "black"), bty="n")plot(cbind(obs$x, obs$y), type="p", xlab="x", ylab="y",
xlim = range(c(obs$x, targ$x)), ylim = range(c(xp, xm, obs$y)),
pch = 20, col = "black", main = "with derivatives")
lines(GP2$xstar, GP2$mean, col = "red") # mean
lines(GP2$xstar, xm2,lty=3) # + sd
lines(GP2$xstar, xp2,lty=3) # - sd
legend("topleft", legend = c("obs", "mean", "sd"), lty = c(NA, 1, 3),
pch = c(20, NA, NA), col=c("black", "red", "black"), bty="n")
y0 <- GP2$mean[sapply(bc$x, function(x, a) which.min(abs(x - a)), GP2$xstar)]
arrows(x0 = bc$x - 1/2, y0 = y0 - bc$y/2,
x1 = bc$x + 1/2, y1 = y0 + bc$y/2,
length = 0.15, col = "dodgerblue2", lwd = 2)To understand everything, please read the previous section (“1D Gaussian Process Modelling”).
If you want to simulate a Gaussian process on a two-dimensional mesh, go to the section “2D Gaussian Process Modelling (simulation on a 2D mesh)”.
The observations are defined by a list with x the positions of the
observations and y the observed values. Here, the element x of the
observation list is a matrix corresponding to the coordinates of the
observations points (East/North coordinates or x/y coordinates).
#observations
obs <- list(x = cbind(c(2.17, 7.92, 8.98, 7.77, 2.79, 5.36, 4.27, 3.07, 6.31),
c(1.33, 7.24, 4.26, 2.67, 6.17, 8.04, 3.18, 5.63, 8.33)),
y = c(2.60, 1.48, 1.36, 8.61, 1.00, 1.58, 8.42, 8.39, 1.50))The target is defined by a two-columns matrix corresponding to the coordinates of the target points.
# targets (=2D mesh)
targ <- list(x = cbind(c(2.17, 7.92, 8.98, 7.77, 2.79, 5.36, 4.27, 3.07, 6.31,
3.74, 5.93, 7.19, 6.61, 5.54, 2.27, 1.61, 4.02, 1.06),
c(1.33, 7.24, 4.26, 2.67, 6.17, 8.04, 3.18, 5.63, 8.33,
6.34, 3.68, 6.82, 1.79, 8.60, 7.73, 5.35, 2.45, 4.92))
)To build the covariance functions, the same kernels as in the previously defined are available:
# Matern kernel
covModel <- list(kernel="matern",
l = 5, # correlation length
v = 1, # smoothness
h = 2.45 # std. deviation
)Note that the 2D mean functions (or basis functions) are differently defined:
# 2D quadratic mean function
op <- 5
# zero-mean function (no trend)
op <- 0
# 2D linear mean function
op <- 2Standard deviation (measurement error):
# Gaussian likelihood
sigma <- 0.2GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = op)
names(GP)
# GP$mean = mean value at location xstar
# GP$cov = covariance matrix of the conditioned GP
# GP$logLik = log-likelihood of the conditioned GP
# GP$xstar = x-coordinates at which the GP is simulated# mean
Ymean <- GP$mean
# standard deviation
YSD <- sqrt(diag(GP$cov))
ysdplus <- Ymean - 1.95* YSD
ysdminus <- Ymean + 1.95* YSDPlot the mean and standard deviation functions
Three-dimensional plot
par(mfrow = c(1,1))
ylim <- range(Ymean, obs$y)
plot3D::scatter3D(x = targ$x[,1], y = targ$x[,2], z = Ymean, clim = ylim,
pch = 20)
plot3D::arrows3D(x0 = targ$x[,1], y0 = targ$x[,2], z0 = ysdminus,
x1 = targ$x[,1], y1 = targ$x[,2], z1 = ysdplus,
length=0.05, angle=90, code=3, add = TRUE, col = "black")
# large dots = observations
plot3D::scatter3D(x = obs$x[,1], y = obs$x[,2], z = obs$y, add = TRUE,
pch = 20, cex = 3, clim = ylim)Pair of two-dimensional plots
par(mfrow = c(1, 2))
ylim <- range(ysdplus, ysdminus, obs$y)
plot(targ$x[,1], Ymean, type = "p", ylim = ylim, pch = 3, cex = 0.5)
arrows(targ$x[,1], ysdminus, targ$x[,1], ysdplus, length=0.05, angle=90, code=3)
points(obs$x[,1], obs$y, col = "dodgerblue", pch = 20)
plot(targ$x[,2], Ymean, type = "p", ylim = ylim, pch = 3, cex = 0.5)
arrows(targ$x[,2], ysdminus, targ$x[,2], ysdplus, length=0.05, angle=90, code=3)
points(obs$x[,2], obs$y, col = "dodgerblue", pch = 20)L <- cholfac(GP$cov)
ystar <- gpSim(GP , L = L)
par(mfrow = c(1,1))
ylim <- range(ystar[,3], obs$y)
plot3D::scatter3D(x = targ$x[,1], y = targ$x[,2], z = ystar[,3], clim = ylim,
pch = 18)
# dots = observations
plot3D::scatter3D(x = obs$x[,1], y = obs$x[,2], z = obs$y, add = TRUE,
pch = 20, cex = 3, clim = ylim)To understand everything, please read the previous section (“1D Gaussian Process Modelling”).
The observations are defined by a list with x the positions of the
observations and y the observed values. Here, the element x of the
observation list is a matrix corresponding to the coordinates of the
observations points (East/North coordinates or x/y coordinates).
#observations
obs <- list(x = cbind(c(2.17, 7.92, 8.98, 7.77, 2.79, 5.36, 4.27, 3.07, 6.31,
3.74, 5.93, 7.19, 6.61, 5.54, 2.27, 1.61, 4.02, 1.06),
c(1.33, 7.24, 4.26, 2.67, 6.17, 8.04, 3.18, 5.63, 8.33,
6.34, 3.68, 6.82, 1.79, 8.60, 7.73, 5.35, 2.45, 4.92)),
y = c(2.60, 1.48, 1.36, 8.61, 1.00, 1.58, 8.42, 8.39, 1.50,
9.05, 1.14, 1.49, 9.19, 1.32, 1.03, 6.41, 6.16, 5.42))The target is defined by a regular grid defined by two orthogonal
vectors. The function vecGridreturns a two-columns matrix
corresponding to the coordinates of each element of the grid.
# targets (=2D mesh)
vx <- seq(0, 10, by = 0.5)
vy <- seq(0, 10, by = 0.5)
targ <- list(x = vecGrid(vx, vy))To build the covariance functions, the same kernels as in the previously defined are available:
# linear kernel
covModel <- list(kernel="linear",
b = 1, # slope
h = 0.5, # std. deviation
c = 1) # constant
# Matern kernel
covModel <- list(kernel="matern",
l = 1, # correlation length
v = 2.5, # smoothness
h = 2.45) # std. deviation
# squared exponential kernel (Gaussian)
covModel <- list(kernel="gaussian",
l = 0.5, # correlation length
h = 0.25) # std. deviationNote that the 2D mean functions (or basis functions) are differently defined:
# 2D quadratic mean function
op <- 5
# zero-mean function (no trend)
op <- 0
# 2D linear mean function
op <- 2Standard deviation (measurement error):
# Gaussian likelihood
sigma <- 0.2GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = op)
names(GP)
# GP$mean = mean value at location xstar
# GP$cov = covariance matrix of the conditioned GP
# GP$logLik = log-likelihood of the conditioned GP
# GP$xstar = x-coordinates at which the GP is simulatedPlot the mean and standard deviation functions
# mean
Ymean <- matrix(GP$mean, nrow = length(vx), ncol = length(vy), byrow = TRUE)
# standard deviation
YSD <- matrix(sqrt(diag(GP$cov)), nrow = length(vx), ncol = length(vy),
byrow = TRUE)
par(mfrow = c(2,2))
plot3D::image2D(x = vx, y = vy, z = Ymean, asp=1)
points(obs$x, col="white",pch=3)
title(main = "mean")
plot3D::contour2D(x = vx, y = vy, Ymean, asp=1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "mean")
plot3D::image2D(x = vx, y = vy, z = YSD, asp=1)
points(obs$x, col="white",pch=3)
title(main = "standard deviation")
plot3D::contour2D(x = vx, y = vy, YSD, asp=1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "standard deviation")Random conditional simulation
L <- cholfac(GP$cov)
ystar <- gpSim(GP , L = L)
Ysim <- matrix(ystar[,3], nrow = length(vx), ncol = length(vy), byrow = TRUE)
par(mfrow=c(1,2))
plot3D::image2D(x = vx, y = vy, z = Ysim, asp=1)
points(obs$x, col="white",pch=3)
plot3D::contour2D(x = vx, y = vy, Ysim, asp=1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])covModelAni <- list(kernel="matern",
l = 1, # correlation length
v = 2.5, # smoothness
h = 2.45,
scale = c(1, 0.25)) # std. deviation
# 2D linear mean function
op <- 2
GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModelAni),
sigma = sigma, op = op)
names(GP)
# GP$mean = mean value at location xstar
# GP$cov = covariance matrix of the conditioned GP
# GP$logLik = log-likelihood of the conditioned GP
# GP$xstar = x-coordinates at which the GP is simulatedPlot the mean and standard deviation functions
# mean
YmeanAni <- matrix(GP$mean, nrow = length(vx), ncol = length(vy), byrow = TRUE)
# standard deviation
YSDAni <- matrix(sqrt(diag(GP$cov)), nrow = length(vx), ncol = length(vy),
byrow = TRUE)
par(mfrow = c(2,2))
plot3D::image2D(x = vx, y = vy, z = Ymean, asp = 1)
points(obs$x, col="white",pch=3)
title(main = "isotropic GP: mean ")
plot3D::contour2D(x = vx, y = vy, Ymean, asp = 1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "isotropic GP: mean")
plot3D::image2D(x = vx, y = vy, z = YmeanAni, asp = 1)
points(obs$x, col="white",pch=3)
title(main = "anisotropic GP: mean ")
plot3D::contour2D(x = vx, y = vy, YmeanAni, asp = 1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "anisotropic GP: mean")covModelAni2 <- list(kernel="matern",
l = 1, # correlation length
v = 2.5, # smoothness
h = 2.45,
scale = c(1, 0.25),
rot = c(1.0)) # std. deviation
# 2D linear mean function
op <- 2
GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModelAni2),
sigma = sigma, op = op)
names(GP)
# GP$mean = mean value at location xstar
# GP$cov = covariance matrix of the conditioned GP
# GP$logLik = log-likelihood of the conditioned GP
# GP$xstar = x-coordinates at which the GP is simulatedPlot the mean and standard deviation functions
# mean
YmeanAni2 <- matrix(GP$mean, nrow = length(vx), ncol = length(vy), byrow = TRUE)
# standard deviation
YSDAni2 <- matrix(sqrt(diag(GP$cov)), nrow = length(vx), ncol = length(vy),
byrow = TRUE)
par(mfrow = c(2,2))
plot3D::image2D(x = vx, y = vy, z = YmeanAni, asp = 1)
points(obs$x, col="white",pch=3)
title(main = "anisotropic GP (scale): mean ")
plot3D::contour2D(x = vx, y = vy, YmeanAni, asp = 1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "anisotropic GP (scale): mean")
plot3D::image2D(x = vx, y = vy, z = YmeanAni2, asp = 1)
points(obs$x, col="white",pch=3)
title(main = "anisotropic GP (scale + rotation): mean ")
plot3D::contour2D(x = vx, y = vy, YmeanAni2, asp = 1)
points(obs$x, col="black",pch=3)
rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "anisotropic GP (scale + rotation): mean")Interpolation of hydraulic heads that accounts for no-flow boundary conditions at the top and bottom model boundary (adapted from Kuhlman and Igúzquiz, 2010, doi:10.1016/j.jhydrol.2010.01.002).
We create a new object bc (a list) with elements
xis the locations where we set the derivative of the Gaussian field,vthe gradient derivative, i.e., a unit vector normal to the no-flow boundary (or tangent to a constant-head boundary)yis the value of the gradient (in this case0meaning that the gradient is flat)sigmathe standard deviation reprensenting the uncertainty on the value of the gradient (i.e.,y).
obs <- list(x = cbind(c(2.17, 7.92, 8.98, 7.77, 2.79, 5.36, 4.27, 3.07, 6.31),
c(1.33, 7.24, 4.26, 2.67, 6.17, 8.04, 3.18, 5.63, 8.33)),
y = c(2.60, 1.48, 1.36, 8.61, 1.00, 1.58, 8.42, 8.39, 1.50))
# Matern kernel
vx <- seq(0, 10, by = 0.25)
vy <- seq(0, 10, by = 0.25)
targ <- list(x = vecGrid(vx, vy))
covModel <- list(kernel="matern",
l = 1,
v = 2.5,
h = 3)
op <- 5
sigma <- 0.05
GP <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = op)
# mean
Ymean <- matrix(GP$mean, nrow = length(vx), ncol = length(vy), byrow = TRUE)
# no-flow boundary top and bottom
bc <- list(x = cbind(c( rep(seq(0.5, 9.5, by = 2), 2)),
c(rep(0, 5), rep(10, 5))),
v = cbind( rep(0, 10),
rep(1, 10) ),
y = rep(0, 10),
sigma = 0)
GPbc <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = op, bc = bc)
# mean
Ymeanbc <- matrix(GPbc$mean, nrow = length(vx), ncol = length(vy), byrow = TRUE)
par(mfrow = c(1,2))
plot3D::contour2D(x = vx, y = vy, Ymean, asp=1, nlevels = 20, col = "black", lwd = 2,
xaxs = "i", yaxs = "i", labcex = 1)
points(obs$x, pch=20, col = "red", cex = 2)
#rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "mean GP")
plot3D::contour2D(x = vx, y = vy, Ymeanbc, asp=1, nlevels = 20, col = "black", lwd = 2,
xaxs = "i", yaxs = "i", labcex = 1)
points(obs$x, pch=20, col = "red", cex = 2)
#rect(vx[1], vy[1], vx[length(vx)], vy[length(vy)])
title(main = "mean GP with no-flow\n boundaries")
points(bc$x, pch = 4)
arrows(bc$x[,1] - bc$v[,2]/2, bc$x[,2] - bc$v[,1]/2,
bc$x[,1] + bc$v[,2]/2, bc$x[,2] + bc$v[,1]/2,
length = 0.15, col = "dodgerblue2", lwd = 2)The same with:
- no-flow condition at top and bottom model boundaries
- constant-head boundary at left and right model boundaries
# no-flow boundary top and bottom
bc2 <- list(x = cbind(c( rep(seq(0.5, 9.5, by = 2), 2),
rep(0, 5), rep(10, 5)),
c( rep(0, 5), rep(10, 5),
rep(seq(0.5, 9.5, by = 2), 2))),
v = cbind( rep(0, 20),
rep(1, 20) ),
y = rep(0, 20),
sigma = 0)
GPbc2 <- gpCond(obs = obs, targ = targ, covModels=list(pos=covModel),
sigma = sigma, op = op, bc = bc2)
# mean
Ymeanbc2 <- matrix(GPbc2$mean, nrow = length(vx), ncol = length(vy), byrow = TRUE)
par(mfrow = c(1,2))
plot3D::contour2D(x = vx, y = vy, Ymeanbc, asp=1, nlevels = 20, col = "black", lwd = 2,
xaxs = "i", yaxs = "i", labcex = 1, xlim = c(-0.5, 10.5))
points(obs$x, pch=20, col = "red", cex = 2)
title(main = "mean GP with no-flow\n boundaries")
points(bc$x, pch = 4)
arrows(bc$x[,1] - bc$v[,2]/2, bc$x[,2] - bc$v[,1]/2,
bc$x[,1] + bc$v[,2]/2, bc$x[,2] + bc$v[,1]/2,
length = 0.15, col = "dodgerblue2", lwd = 2)
plot3D::contour2D(x = vx, y = vy, Ymeanbc2, asp=1, nlevels = 20, col = "black", lwd = 2,
xaxs = "i", yaxs = "i", labcex = 1, xlim = c(-0.5, 10.5))
points(obs$x, pch=20, col = "red", cex = 2)
title(main = "mean GP with no-flow \n and cst head boundaries")
points(bc2$x, pch = 4)
arrows(bc2$x[,1] - bc2$v[,2]/2, bc2$x[,2] - bc2$v[,1]/2,
bc2$x[,1] + bc2$v[,2]/2, bc2$x[,2] + bc2$v[,1]/2,
length = 0.15, col = "dodgerblue2", lwd = 2)To understand everything, please read the previous sections.
The observations are defined by a list with x the positions of the
observations, y the observed time-series and t the time scale. Note
that all the time-series must have the same time scale. Here, the
element x of the observation list is a matrix corresponding to the
coordinates of the observations points (East/North coordinates or x/y
coordinates).
The element y is a big vector constiting of all the time-series
recorded at the positions defined by element x put one after another.
For example, consider 5 monitoring stations with positions
x1, x2, x3, x4 and
x5. At each station, a time-series was recorded:
- at station 1: y1 = y1,1, y1,2, …, y1,t
- at station 2: y2 = y2,1, y2,2, …, y2,t
- …
- at station 5: y5 = y5,1, y5,2, …, y5,t
Then, the element y is set to c(y1, y2,
y3, y4, y5).
Assuming that the data were recorded every hour, the time scale t is
simply 1, 2, 3, …, t.
#observations
obs <- list(x = cbind(c(2, 8, 1, 3, 5),
c(9, 2, 3, 4, 6)),
y = c(1:10 + rnorm(10, 0, 0.1),
1:10 + rnorm(10, -0.5, 0.1),
1:10 + rnorm(10, 1, 0.4),
1:10 + rnorm(10, -0.5, 0.2),
1:10 + rnorm(10, 0, 0.1)),
t = seq_len(10))The target is defined by a regular grid defined by two orthogonal
vectors. The function vecGridreturns a two-columns matrix
corresponding to the coordinates of each element of the grid. For each
element of the grid, the Gaussian process simulate a time-series whose
time scale is identical to that of the observations.
# targets
vx <- seq(0, 10, by = 0.5)
vy <- seq(0, 10, by = 0.5)
targ <- list(x = vecGrid(vx, vy))Two covariance are defined, one for the space domain (element pos) and
one for the time domain (element time). For the moment, the covariance
function of the space-time Gaussian process is defined by the product of
the spatial and temporal kernel.
covModels <- list(pos = list(kernel="matern",
l = 4, # correlation length
v = 2.5, # smoothness
h = 2.45), # std. deviation
time = list(kernel="gaussian",
l = 0.15, # correlation length
h = 1.25))
# 2D mean linear mean function
op <- 2
# Gaussian likelihood
sigma <- 0.2GP <- gpCond(obs = obs, targ = targ, covModels = covModels,
sigma = sigma, op = op)
names(GP)
# GP$mean = mean value at location xstar
# GP$cov = covariance matrix of the conditioned GP
# GP$logLik = log-likelihood of the conditioned GP
# GP$xstar = x-coordinates at which the GP is simulatedThe mean values are re-organised into a three-dimensional array of
dimension $n_t n_x n_y, with
Ymean <- array(GP$mean, dim=c(length(obs$t), length(vx), length(vy)))
Ysd <- array(sqrt(diag(GP$cov)), dim=c(length(obs$t), length(vx), length(vy)))
par(mfrow = c(2,5))
for(i in seq_along(obs$t)){
plot3D::image2D(z = Ymean[i,,], x = vx, y = vy, zlim = range(Ymean),
main = paste("mean at t =",obs$t[i]))
points(obs$x, col="white",pch=20, cex=2)
points(obs$x, col="black",pch=3)
}par(mfrow = c(2,5))
for(i in seq_along(obs$t)){
plot3D::image2D(z = Ysd[i,,], x = vx, y = vy, zlim = range(Ysd),
main = paste("std. dev. at t =",obs$t[i]))
points(obs$x, col="white",pch=20, cex=2)
points(obs$x, col="black",pch=3)
}Random conditional simulation. La function gpSim returns a matrix
whose two first column correspond to the position coordinate, the third
columns corresponds to the time scale and the fourth column to the
simulated Gaussian process.
L <- cholfac(GP$cov)
ystar <- gpSim(GP , L = L)
colnames(ystar) <- c("x1", "x2", "t", "y")
Ysim <- array(ystar[,"y"], dim=c(length(obs$t), length(vx), length(vy)))
par(mfrow = c(2,5))
for(i in seq_along(obs$t)){
plot3D::image2D(z = Ysim[i,,], x = vx, y = vy,
zlim = range(ystar[,"y"]),
main = paste("simulation at t =",obs$t[i]))
points(obs$x, col="white",pch=20, cex=2)
points(obs$x, col="black",pch=3)
}Time-series at location (4,1):
par(mfrow = c(1,1))
plot(Ysim[,vx == 4, vy == 1], type = "l", xlab = "time", ylab = "value")Rasmussen C.E. and Williams C.K.I. (2006), Gaussian Processes for Machine Learning, the MIT Press, ISBN 026218253X. www.GaussianProcess.org/gpml
-
Kernel:
-
linear (implemented as the triangular / linear taper:
$K(r)=\sigma^2\max(0,1-r/\ell)$ — commonly called the triangular or linear covariance in spatial literature). -
cauchy:
$K(r)=\sigma^2\big(1+(r/\ell)^2\big)^{-\nu}$ . -
spherical: compactly supported spherical model: for
$r\le \ell$ ,$K(r)=\sigma^2\big(1-\tfrac{3}{2}(r/\ell)+\tfrac{1}{2}(r/\ell)^3\big)$ , else 0.
-
linear (implemented as the triangular / linear taper:
-
Polynomial and Linear (dot-product) kernels are intrinsically dot-product kernels, i.e. they depend on inner products (x^\top y), not on distances alone. Because your API supplies a distance matrix
R(not a Gram matrix), it is not possible to compute the standard polynomial or linear dot-product kernels fromRalone. To handle that, I added separate exported functions that accept a Gram (inner-product) matrixG:-
kLinearGram_rcpp(G, h, c, use_symmetry)implements the linear dot-product kernel:$K(x,y)=\sigma^2 (x^\top y + c)$ . -
kPolynomialGram_rcpp(G, h, degree, c, use_symmetry)implements the polynomial kernel:$K(x,y)=\sigma^2 (x^\top y + c)^{p}$ .
-
-
For all distance-based kernels I implemented
d=0,1,2handling following the same sign/convention as your original Gaussian/Matern code:-
d==0: kernel value (K(r)). -
d==1: returnsw * (- dK/dr)(this matches your Gaussian and Matern implementations where the returned expression equals the negative radial derivative multiplied byw). -
d==2: returns- d^2K/dr^2(the negative second radial derivative) — where meaningful, and for some compact models that have piecewise polynomials (like spherical/triangular) the exact formulas were used inside the compact support; for points where derivatives are not defined (at boundaries) the code uses the analytic interior derivatives (these are the same conventions used for Gaussian above).
-
Important note about conventions & derivatives
- I followed the same derivative convention you used for Gaussian and Matern:
d==1returnsw * (-dK/dr)andd==2returns-d^2K/dr^2(mapped into the same "directional weight" form you used earlier for Gaussian). That means the new kernels will plug into your existing pipeline without changing howwis used. - For compactly supported kernels (spherical and triangular/linear) I used the analytic interior derivatives; at the range boundary (r=\ell) derivatives are not smooth — the code returns the interior analytic derivatives (consistent with many spatial implementations). If you want to treat boundary derivatives differently (e.g. set them to 0 or use one-sided limits) tell me and I’ll change it.
Usage notes and examples
-
Distance-based kernels (Gaussian, Matern, Cauchy, Spherical, LinearDistance) keep the same API you already used: pass
R(pairwise distances),W(weights used in your derivative formulas),l,h,v(used as shape for Cauchy/Matern),d(0,1,2) anduse_symmetry. -
For dot-product kernels:
- Use
kLinearGram_rcpp(G, h, c, use_symmetry)when you already have the Gram matrix (G = X X^\top).cis the offset (commonly 0 or 1). - Use
kPolynomialGram_rcpp(G, h, degree, c, use_symmetry)for polynomial kernel.degreemust be integer (passed as double for API convenience).
- Use
-
Example R workflow:
- If you have data
X(n x p), computeG <- X %*% t(X)and callkPolynomialGram_rcpp(G, h=1.0, degree=3, c=1, TRUE). - If you have only pairwise distances
Rand want a polynomial kernel you must either reconstruct dot-products (need individual norms) or compute Gram directly fromX.
- If you have data
Perfect! Here’s a fully documented RcppEigen version of your linear (Gram) kernel with derivative support (d = 0, 1, 2) and optional symmetry. The derivatives follow the usual convention from your other kernels:
-
d = 0:$K(X, Y) = h^2 (X Y^T) + b^2$ -
d = 1: derivative w.r.t each entry — for a linear kernel, this is constant, so we return a matrix ofh^2scaled byw -
d = 2: second derivative is zero
Great! Let’s do the polynomial (Gram) kernel next. We’ll follow the same style as the linear kernel, support derivatives d = 0, 1, 2, optional symmetry, and allow a bias term c as in your R code.
The polynomial kernel formula:
Derivative conventions:
-
d = 0: kernel values. -
d = 1: first derivative w.r.t entries: $ dK/dr = h^2 \cdot \text{degree} \cdot (X Y^\top + c)^{\text{degree}-1} $, multiplied elementwise byw. -
d = 2: second derivative: $ d^2 K/dr^2 = h^2 \cdot \text{degree} \cdot (\text{degree}-1) \cdot (X Y^\top + c)^{\text{degree}-2} $, multiplied elementwise byw.
The provided Rcpp function, kCauchy_rcpp_fast, appears mathematically correct for implementing the Cauchy kernel and its first two derivatives with respect to the distance R.array()), weighted by W.array()), based on the standard form of the Cauchy kernel (or Rational Quadratic kernel).
The function computes the kernel
The Cauchy kernel (where
-
$r$ is the distance (fromR) -
$l > 0$ is the length-scale (l) -
$h \ge 0$ is the marginal standard deviation scale (h) -
$\nu > 0$ is the smoothness parameter (nu)
The code correctly implements
The formula implemented:
Eigen::ArrayXXd arr = h_sq * base.pow(-nu);
- This is correct.
The first derivative is:
The code computes W:
Eigen::ArrayXXd arr = w_arr * (h_sq * (2.0 * nu) * r_arr / l_sq) * base.pow(-nu - 1.0);
- This is correct.
The second derivative
The code implements d2:
-
$C = \frac{2 \nu h^2}{l^2}$ Code line:double C = (2.0 * nu * h_sq) / l_sq;(Correct) - Term 1:
$-C \left(1 + \frac{r^2}{l^2}\right)^{-\nu-1}$ Code line:double term1 = -C * base.pow(-nu - 1.0);(Correct) - Term 2:
$C \left(\frac{2r^2}{l^2}\right) (\nu+1) \left(1 + \frac{r^2}{l^2}\right)^{-\nu-2}$ Code line:Eigen::ArrayXXd term2 = C * ((2.0 * r_arr.square()) / l_sq) * (nu + 1.0) * base.pow(-nu - 2.0);(Correct)
The final result is arr = -d2, which returns
- This is also correct based on the comment
// return -d2K/dr2.
The C++ / Eigen / Rcpp implementation details are also sound:
-
Input Checks: The checks for identical dimensions of
RandW, the square requirement foruse_symmetry, and the constraints$l > 0$ and$h \ge 0$ are all correct and robust. -
Vectorization: Using
Eigen::ArrayXXdfor element-wise operations (.square(),.pow(), arithmetic) is the correct and fast way to perform vectorized calculations in Eigen, fulfilling the function's name and purpose. -
Symmetry: Applying the average
K = (K + K.transpose()) * 0.5whenuse_symmetryis true is the standard way to enforce numerical symmetry in kernel matrices, which is correct for a symmetric kernel like Cauchy, where$K(r_{ij}) = K(r_{ji})$ .
I can explain why the first derivative result is often used as
This is a great request! Implementing the Matérn covariance function in RcppEigen, especially for a general smoothness parameter
The Matérn correlation function
where:
-
$d = ||\mathbf{x}_i - \mathbf{x}_j||$ is the Euclidean distance between two points$\mathbf{x}_i$ and$\mathbf{x}_j$ . -
$\rho > 0$ is the range parameter (or length-scale). -
$\nu > 0$ is the smoothness parameter. -
$\Gamma(\cdot)$ is the Gamma function. -
$K_\nu(\cdot)$ is the Modified Bessel function of the second kind of order$\nu$ .
The key to general R::bessel_k function (part of the R API that can be called from Rcpp code).
That's an excellent question! Understanding
Here is an explanation of the role of these two special functions in the general Matérn formula:
The term
-
Role: Its purpose is to ensure that the correlation at distance
$d=0$ is exactly 1, i.e.,$C(0) = 1$ . This is a necessary property for any valid correlation function. -
Definition: The Gamma function
$\Gamma(z)$ is an extension of the factorial function to complex and real numbers. For a positive integer$n$ ,$\Gamma(n) = (n-1)!$ . -
Effect of
$\nu$ : As$\nu$ changes, the magnitude of the Bessel function term$u^\nu K_\nu(u)$ also changes. The Gamma function term compensates for this, keeping the maximum correlation at 1.
The Modified Bessel function of the second kind,
Let
- Role: It mathematically models the correlation decay and the non-differentiability at the origin.
-
The Key to Smoothness: The value of the smoothness parameter
$\nu$ determines the behavior of the Bessel function near$d=0$ .- The spatial process is
$\lfloor \nu \rfloor - 1$ times mean-square differentiable. -
Low
$\nu$ (e.g.,$\nu=0.5$ ):$K_{0.5}(u)$ results in the Exponential covariance model (which is not differentiable at the origin, representing a very rough process). -
High
$\nu$ (e.g.,$\nu=2.5$ ):$K_{2.5}(u)$ results in a smoother function (the Matérn 5/2 model) that is twice differentiable at the origin, representing a much smoother spatial field.
- The spatial process is
-
Limiting Case: As
$\nu \to \infty$ , the Matérn function converges to the infinitely differentiable Gaussian (or Squared Exponential) covariance function.
|
|
Name | Smoothness at Origin ( |
Differentiability |
|---|---|---|---|
| 0.5 | Exponential | Very rough | Not differentiable |
| 1.5 | Matérn 3/2 | Moderately smooth | Once differentiable |
| 2.5 | Matérn 5/2 | Smooth | Twice differentiable |
| Gaussian | Infinitely smooth | Infinitely differentiable |
In essence,
Suppose you have a radial kernel:
where (r) is the Euclidean distance between points.
Now consider a directional derivative of the kernel along the vector (\mathbf{x} - \mathbf{x}'):
Because (K) depends on (\mathbf{x}) only through (r), we can use the chain rule:
For Euclidean distance:
the derivative w.r.t. (\mathbf{x}) is:
Notice:
- (k'(r) = \frac{dK}{dr})
- (\frac{\mathbf{x} - \mathbf{x}'}{r}) is a unit vector pointing away from (\mathbf{x}')
Sometimes, in your implementation, you have a weight matrix (W) or a scaling factor along the radial direction:
Then the directional derivative along that direction is:
For kernels like Gaussian, (k(r) = \exp(-r^2/(2l^2))), we have:
so often in code it’s implemented as:
because:
- (k'(r)) for Gaussian is negative (decreasing function with distance).
- Multiplying by (-1) gives a positive scaling along the radial direction, consistent with gradient-based formulas.
- (w) encodes the directional weighting, e.g., from derivative of the kernel w.r.t. a coordinate.
- Radial kernels only depend on distance, so any derivative w.r.t. coordinates uses chain rule through (r).
- The gradient points along the vector connecting the two points.
- The magnitude is (|k'(r)|).
- Multiplying by (w) and sometimes (-1) in code aligns the sign and scales the derivative properly.
✅ In short:
w= direction / weight-dK/dr= ensures positive contribution when kernel decays with distance
This is why in your Rcpp kernels you see the pattern:
K = w * (1.0 / l) * common * besK_v_minus_1;…it’s exactly the radial chain rule in action.
Nice — here are the first and second derivatives of the Matérn covariance (with respect to the scalar distance (r)). I’ll state the Matérn in the common form, introduce a short-hand, then give compact, simplified expressions.
Matérn covariance (isotropic, as a function of distance (r)) $$ k(r) = \sigma^2 ,\frac{2^{1-\nu}}{\Gamma(\nu)}, \Big(\frac{\sqrt{2\nu},r}{\ell}\Big)^{!\nu} K_{\nu}!\Big(\frac{\sqrt{2\nu},r}{\ell}\Big), \qquad r\ge0, $$ where (K_\nu) is the modified Bessel function of the second kind, (\nu>0) the smoothness, (\ell>0) the length-scale and (\sigma^2) the variance.
Define $$ a ;:=; \frac{\sqrt{2\nu}}{\ell},\qquad C ;:=; \frac{2^{1-\nu}}{\Gamma(\nu)}. $$ Then (k(r)=\sigma^2 C,(a r)^{\nu}K_{\nu}(a r)=\sigma^2 C a^{\nu} r^{\nu}K_{\nu}(a r)).
Using the Bessel identity (\dfrac{d}{dx}[x^\nu K_\nu(x)]=-x^\nu K_{\nu-1}(x)) (and chain rule), $$ \boxed{% k'(r) ;=; -,\sigma^2, C, a^{\nu+1}, r^{\nu}, K_{\nu-1}(a r) } \qquad (r>0). $$
Differentiating (k'(r)) and simplifying with standard Bessel recurrences gives the compact form $$ \boxed{% k''(r) ;=; \sigma^2, C, a^{\nu+1}, r^{\nu-1}!\big[,a r,K_{\nu}(a r);-;K_{\nu-1}(a r),\big] } \qquad (r>0). $$
(One can verify this by expanding the derivative of (r^\nu K_{\nu-1}(a r)) and using (K'\mu(z)=-\tfrac12\big(K{\mu-1}(z)+K_{\mu+1}(z)\big)) and the recurrence relations.)
- In terms of the original parameters, $$ k'(r) ;=; -,\sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \Big(\frac{\sqrt{2\nu}}{\ell}\Big)^{\nu+1} r^{\nu} K_{\nu-1}!\Big(\frac{\sqrt{2\nu},r}{\ell}\Big), $$ $$ k''(r) ;=; \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \Big(\frac{\sqrt{2\nu}}{\ell}\Big)^{\nu+1} r^{\nu-1}!\left[\frac{\sqrt{2\nu},r}{\ell}K_{\nu}!\Big(\frac{\sqrt{2\nu},r}{\ell}\Big)-K_{\nu-1}!\Big(\frac{\sqrt{2\nu},r}{\ell}\Big)\right]. $$
- Special case (\nu=\tfrac12): (K_{1/2}(z)=\sqrt{\tfrac{\pi}{2z}}e^{-z}) and the Matérn reduces to the exponential kernel; the above formulas reduce to elementary exponentials.
Nice — here are the closed forms for the Matérn kernel and its first two derivatives for the three common half-integer smoothness values. I use the standard shorthand (a=\dfrac{\sqrt{2\nu}}{\ell}) and keep (\sigma^2) for the variance. For each case I also state (a) explicitly.
(a=\dfrac{\sqrt{2\cdot\frac12}}{\ell}=\dfrac{1}{\ell})
(Valid for (r\ge0); derivatives written for (r>0) and extend continuously to (r=0).)
(a=\dfrac{\sqrt{2\cdot\frac32}}{\ell}=\dfrac{\sqrt{3}}{\ell})
(a=\dfrac{\sqrt{2\cdot\frac52}}{\ell}=\dfrac{\sqrt{5}}{\ell})

















