GPflow models
GPSat models based on the GPflow python package for GPU enhanced GP modelling.
- class GPSat.models.gpflow_models.GPflowGPRModel(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)
Bases:
BaseGPRModel
Model based on the GPflow implementation of exact Gaussian process regression (GPR).
See
BaseGPRModel
for a complete list of attributes and methods.- __init__(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)
- Parameters:
- data
- coords_col
- obs_col
- coords
- obs
- coords_scale
- obs_scale
- obs_mean
- verbose
- kernel: str | gpflow.kernels, default “Matern32”
The kernel used for GPR. We can use the following GPflow kernels, which can be passed as a string: “Cosine”, “Exponential”, “Matern12”, “Matern32”, “Matern52”, “RationalQuadratic” or “RBF” (equivalently “SquaredExponential”).
- kernel_kwargs: dict, optional
Keyword arguments to be passed to the GPflow kernel specified in
kernel
.- mean_function: str | gpflow.mean_functions, optional
GPflow mean function to model the prior mean.
- mean_func_kwargs: dict, optional
Keyword arguments to be passed to the GPflow mean function specified in
mean_function
.- noise_variance: float, optional
Variance of Gaussian likelihood. Unnecessary if
likelihood
is specified explicitly.- likelihood: gpflow.likelihoods.Gaussian, optional
GPflow model for Gaussian likelihood used to model data uncertainty. Can use custom GPflow Gaussian likelihood class here. Unnecessary if using a vanilla Gaussian likelihood and
noise_variance
is specified.
- get_kernel_variance() float
Returns the kernel variance hyperparameter.
- get_lengthscales() ndarray
Returns the lengthscale kernel hyperparameters.
- get_likelihood_variance() float
Returns the likelihood variance hyperparameter.
- get_objective_function_value()
Get the negative marginal log-likelihood loss.
- optimise_parameters(max_iter=10000, fixed_params=None, **opt_kwargs)
Method to optimise the kernel hyperparameters using a scipy optimizer (
method = L-BFGS-B
by default).- Parameters:
- max_iter: int, default 10000
The maximum number of interations permitted for optimisation. The optimiser runs either until convergence or until the number of iterations reach
max_iter
.- fixed_params: list of str, default []
Parameters to fix during optimisation. Should be one of “lengthscales”, “kernel_variance” and “likelihood_variance”.
- opt_kwargs: dict, optional
Keyword arguments passed to gpflow.optimizers.Scipy.minimize().
- Returns:
- bool
Indication of whether optimisation was successful or not, i.e. converges within the maximum number of iterations set.
- property param_names: list
Returns the model hyperparameter names: “lengthscales”, “kernel_variance” and “likelihood_variance”.
- predict(coords, full_cov=False, apply_scale=True) Dict[str, ndarray]
Method to generate prediction at given coords.
- Parameters:
- coords: pandas series | pandas dataframe | list | numpy array
Coordinate locations where we want to make predictions.
- full_cov: bool, default False
Flag to determine whether to return a full covariance matrix at the prediction coords or just the marginal variances.
- apply_scale: bool, default True
If
True
,coords
should be the raw, untransformed values. IfFalse
,coords
must be rescaled byself.coords_scale
. (seeBaseGPRModel
attributes).
- Returns:
- dict of numpy arrays
If
full_cov = False
, returns a dictionary containing the posterior mean “f*”, posterior variance “f*_var” and predictive variance “y_var” (i.e. the posterior variance + likelihood variance).If
full_cov = True
, returns a dictionary containing the posterior mean “f*”, posterior marginal variance “f*_var”, predictive marginal variance “y_var”, full posterior covariance “f*_cov” and full predictive covariance “y_cov”.
- set_kernel_variance(kernel_variance)
Setter method for kernel variance.
- Parameters:
- kernel_variance: int | float | numpy array | tensorflow tensor | list of int or float
int, float or Tensor-like data of size 1 specifying the kernel variance.
- set_kernel_variance_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)
Sets constraints on the kernel variance.
- Parameters:
- low: int | float
Minimal value for kernel variance.
- high: list | int | float
Maximal value for kernel variance.
- move_within_tol: bool, default True
If
True
, ensures that current hyperparam values are within the interval [low+tol, high-tol] fortol
given below.- tol: float, default 1e-8
The tol value for when
move_within_tol = True
.- scale: bool, default False
If
True
, thelow
andhigh
values are set with respect to the untransformed coord values. IfFalse
, they are set with respect to the transformed values.- scale_magnitude: int or float, optional
The value with which one rescales the coord values if
scale = True
. IfNone
, it will transform byself.coords_scale
(seeBaseGPRModel
attributes).
- set_lengthscales(lengthscales)
Setter method for kernel lengthscales.
- Parameters:
- lengthscales: numpy array | tensorflow tensor | list of int or float | int | float
Tensor-like data of size D (input dimensions) specifying the lengthscales in each dimension. If specified as an int or a float, it will assign the same lengthscale in each dimension.
- set_lengthscales_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)
Sets constraints on the lengthscale hyperparameters.
- Parameters:
- low: list | int | float
Minimal value for lengthscales. If specified as a
list
type, it should have length D (coordinate dimension) where the entries correspond to minimal values of the lengthscale in each dimension in the order given byself.coords_col
(seeBaseGPRModel
attributes). Ifint
orfloat
, the same minimal values are assigned to each dimension.- high: list | int | float
Same as above, except specifying the maximal values.
- move_within_tol: bool, default True
If
True
, ensures that current hyperparam values are within the interval [low+tol, high-tol] fortol
given below.- tol: float, default 1e-8
The tol value for when
move_within_tol = True
.- scale: bool, default False
If
True
, thelow
andhigh
values are set with respect to the untransformed coord values. IfFalse
, they are set with respect to the transformed values.- scale_magnitude: int or float, optional
The value with which one rescales the coord values if
scale = True
. IfNone
, it will transform byself.coords_scale
(seeBaseGPRModel
attributes).
- set_likelihood_variance(likelihood_variance)
Setter method for likelihood variance.
- Parameters:
- likelihood_variance: int | float | numpy array | tensorflow tensor | list of int or float
int, float or Tensor-like data of size 1 specifying the likelihood variance.
- set_likelihood_variance_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)
Sets constraints on the likelihood variance.
- Parameters:
- low: int | float
Minimal value for likelihood variance.
- high: list | int | float
Maximal value for likelihood variance.
- move_within_tol: bool, default True
If
True
, ensures that current hyperparam values are within the interval [low+tol, high-tol] fortol
given below.- tol: float, default 1e-8
The tol value for when
move_within_tol=True
.- scale: bool, default False
If
True
, thelow
andhigh
values are set with respect to the untransformed coord values. IfFalse
, they are set with respect to the transformed values.- scale_magnitude: int or float, optional
The value with which one rescales the coord values if
scale=True
. IfNone
, it will transform byself.coords_scale
(seeBaseGPRModel
attributes).
- update_obs_data(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None)
- class GPSat.models.gpflow_models.GPflowSGPRModel(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=500, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)
Bases:
GPflowGPRModel
Model using sparse GPR method to handle data size beyond capacity for exact GPR. This introduces a set of M pseudo data points referred to as the inducing points, which summarises information contained in the original dataset (see [T’09] for more details).
Choosing a smaller number of inducing points, one is able to handle large data size up to order ~O(1e5). However, the prediction quality may also deteriorate with fewer inducing points so it is necessary to tune the number of inducing points to strike a good balance between efficiency and accuracy.
See
BaseGPRModel
for a complete list of attributes and methods.Notes
This is sub-classed from
GPflowGPRModel
and uses the samepredict()
method.Has O(NM^2) computational complexity and O(NM) memory scaling.
Several techniques for inducing point selection exists (e.g. see this GPflow tutorial), however we have only implemented the random selection method, where inducing points are initialised as M random sub-samples of the training data.
References
[T’09] Titsias, Michalis. “Variational learning of inducing variables in sparse Gaussian processes.” Artificial intelligence and statistics. PMLR, 2009.
- __init__(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=500, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)
- Parameters:
- data
- coords_col
- obs_col
- coords
- obs
- coords_scale
- obs_scale
- obs_mean
- verbose
- kernel
- kernel_kwargs
- mean_function
- mean_func_kwargs
- noise_variance
- likelihood
- num_inducing_points: int, default 500
The number of inducing points.
- get_inducing_points() ndarray
Get the inducing point locations.
- get_objective_function_value()
Get the ELBO value for current state.
- optimise_parameters(train_inducing_points=False, max_iter=10000, fixed_params=[], **opt_kwargs)
Method to optimise the model parameters (kernel hyperparmeters + inducing point locations) using a scipy optimizer (
method = L-BFGS-B
by default).- Parameters:
- train_inducing_points: bool, default False
Flag to specify whether to optimise the inducing point locations or not. Setting this to
True
may improve results, however may also lead to slower convergence.- max_iter: int, default 10000
The maximum number of interations permitted for optimisation. The optimiser runs either until convergence or until the number of iterations reach
max_iter
.- fixed_params: list of str, default []
Parameters to fix during optimisation. Should be one of “lengthscales”, “kernel_variance” and “likelihood_variance”.
- opt_kwargs: dict, optional
Keyword arguments passed to gpflow.optimizers.Scipy.minimize().
- Returns:
- bool
Indication of whether optimisation was successful or not, i.e. converges within the maximum number of iterations set.
- property param_names: list
Returns a list of model hyperparameter names (“lengthscales”, “kernel_variance” and “likelihood_variance”), in addition to “inducing points”.
- set_inducing_points(inducing_points)
Setter method for inducing point locations.
- Parameters:
- inducing_points: np.ndarray
Inducing point locations specified as a numpy array of size [M, D].
- class GPSat.models.gpflow_models.GPflowSVGPModel(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=None, minibatch_size=None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood=None, likelihood_kwargs=None, **kwargs)
Bases:
GPflowGPRModel
Model using SVGP (Sparse Variational GP [H’13]) to deal with even larger data size (even when compared to SGPR), in addition to handling non-Gaussian likelihoods.
Key differences with SGPR are (1) stochastic optimisation of parameters via mini-batching of training data, and (2) gradient-based optimisation of the variational distribution, parameterised by a mean and cholesky factor of the covariance, as opposed to exact computation. The former allows handling of larger data + inducing point sizes and the latter allows handling of non-Gaussian likelihoods (see [H’13] for more details).
See
BaseGPRModel
for a complete list of attributes and methods.Notes
This is sub-classed from
GPflowGPRModel
and uses the samepredict()
method.Introduces an extra hyperparameter
minibatch_size
to be tuned.Has O(BM^2 + M^3) computational complexity and O(BM + M^2) memory scaling, where B is the minibatch size.
Saving the variational parameters to the results file may be memory intensive due to the M^2 memory scaling of the cholesky factor. Consider leaving them out of the results file when running experiments (TODO: cross reference ModelConfig).
References
[H’13] Hensman, James, Nicolo Fusi, and Neil D. Lawrence. “Gaussian processes for big data.” arXiv preprint arXiv:1309.6835 (2013).
- __init__(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=None, minibatch_size=None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood=None, likelihood_kwargs=None, **kwargs)
- Parameters:
- data
- coords_col
- obs_col
- coords
- obs
- coords_scale
- obs_scale
- obs_mean
- kernel
- kernel_kwargs
- mean_function
- mean_func_kwargs
- noise_variance
- likelihood: str or gpflow.likelihoods, optional
A GPflow likelihoods object for modelling the likelihood. This is not necessarily a Gaussian. For available GPflow likelihoods, pass a string (e.g.
likelihood = "StudentT"
). However if not specified, it will default to a Gaussian likelihood with variance given bynoise_variance
.- likelihood_kwargs: dict, optional
Keyword arguments passed to
likelihood
.- num_inducing_points: int, optional
The number of inducing points. If not specified, it will set the inducing points to be the data points, in which case the algorithm becomes equivalent to VGP.
- minibatch_size: int, optional
The size of minibatch used for stochastic estimation of the loss function. Using smaller batch sizes will result in increased per-iteration efficiency, however optimisation becomes more noisy. If not specified, it will not apply minibatching.
- get_inducing_chol() ndarray
Get the cholesky factor of the covariace of the variational distribution.
- get_inducing_mean() ndarray
Get the mean of the variational distribution.
- get_inducing_points() ndarray
Get the inducing point locations.
- get_objective_function_value()
Get the ELBO averaged over minibatches.
- optimise_parameters(train_inducing_points=False, natural_gradients=False, fixed_params=[], gamma=0.1, learning_rate=0.01, max_iter=10000, persistence=100, check_every=10, early_stop=True, verbose=False)
Method to optimise the model parameters (kernel hyperparmeters + variational parameters). We use the Adam optimiser for stochastic optimisation of the model parameters.
- Parameters:
- train_inducing_points: bool, default False
Flag to determine whether or not to optimise inducing point locations.
- natural_gradients: bool, default False
Option to use natural gradients to optimise the variational parameters (inducing mean and cholesky). Previous investigations indicate benefits of using them over using Adam to optimise all parameters. (see more details here)
- gamma: float, default 0.1
Step length for natural gradient. When not using minibatches, best to set
gamma = 1.0
. However, empirically shown to be better using smallergamma
e.g. 0.1 when minibatching.- fixed_params: list of str, default []
Parameters to fix during optimisation. Should be one of “lengthscales”, “kernel_variance”, “likelihood_variance”, “inducing points”, “inducing_mean” and “inducing_chol”.
- learning_rate: float, default 1e-2
Learning rate for Adam optimizer.
- max_iter: int, default 10000
The maximum number of interations permitted for optimisation. The optimiser runs either until convergence (see discussion on the convergence criterion in Notes below) or until the number of iterations reach
max_iter
.- early_stop: bool, default True
Flag to set early stopping criterion (see Notes below). If
False
, it will run until number of iterations reachmax_iter
, which can be quite slow.- persistence: int, default 100
See Notes below.
- check_every: int, default 10
See Notes below.
- verbose: bool, default False
Set verbosity of model optimisation. If
True
, displays the loss everycheck_every
steps.
- Returns:
- bool
Indication of whether optimisation was successful or not.
Notes
Since we use stochastic optimisation, traditional convergence criterion to stop early does not apply here. We instead devise a stopping criterion as follows:
Check the ELBO every
check_every
iterations.If the ELBO does not improve after
persistence
iterations, stop optimisation.
This stopping criterion will be enabled if
early_stop
is set toTrue
.
- property param_names: list
Returns a list of model hyperparameter names (“lengthscales”, “kernel_variance” and “likelihood_variance”), in addition to the variational hyperparameters (“inducing points”, “inducing_mean” and “inducing_chol”).
The “inducing_mean” and “inducing_chol” are respectively, the mean and cholesky factor of the covariance of the Gaussian variational distribution used to approximate the true posterior distribution.
- set_inducing_chol(q_sqrt)
Setter method for the inducing cholesky factor.
- Parameters:
- q_sqrt: np.ndarray
Inducing cholesky values specified as a numpy array of size [1, M, M].
- set_inducing_mean(q_mu)
Setter method for the inducing mean.
- Parameters:
- q_mu: np.ndarray
Inducing mean values specified as a numpy array of size [M, 1].
- set_inducing_points(inducing_points)
Setter method for inducing point locations.
- Parameters:
- inducing_points: np.ndarray
Inducing point locations specified as a numpy array of size [M, D], where D is the input dimension size.
- class GPSat.models.vff_model.GPflowVFFModel(data=None, coords_col=None, obs_col=None, coords=None, coords_scale=None, obs=None, obs_scale=None, obs_mean=None, *, kernel='Matern32', num_inducing_features: int | list | None = None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, domain_size: float | List[float] | None = None, expert_loc=None, **kwargs)
Bases:
GPflowGPRModel
GPSat model using VFF (variational Fourier features) to handle large data size in low dimensions.
This is a prime example of the interdomain approach where pseudo data points (called inducing features) are placed on a transformed domain instead of the physical domain. For VFF, these inducing features are placed in the frequency domain, which can achieve better scaling in the number of data points compared to SGPR, owing to the orthogonality of the sinusoidal basis functions (see [H’17] for more details).
However, VFF requires using a separable kernel in each dimension, resulting in poor scaling in the input dimensions. Thus, benefits are usually seen for lower dimensional problems such as 1D, 2D and possibly 3D in some cases.
See
BaseGPRModel
for a complete list of attributes and methods.Notes
This is sub-classed from
GPflowGPRModel
and uses the samepredict()
method.Likewise, it uses the same
get_likelihood_variance()
,set_likelihood_variance()
andset_likelihood_variance_constraints()
methods.We place inducing features in each input dimension and the effective number M of inducing features is the product of the per-dimension number of inducing features.
Has O(NM^2) pre-computation cost, O(M^3) per-iteration complexity and O(NM) memory scaling.
Crucially, VFF is restricted to work in a finite domain. This introduces an extra variable
domain_size
to be tuned, which can affect performance. As a rule of thumb, thedomain_size
should be large enough to subsume the training and inference regions, but making it too large can lead to predictions that are overly smooth.
References
[H’17] Hensman, James, Nicolas Durrande, and Arno Solin. “Variational Fourier Features for Gaussian Processes.” J. Mach. Learn. Res. (2017).
- __init__(data=None, coords_col=None, obs_col=None, coords=None, coords_scale=None, obs=None, obs_scale=None, obs_mean=None, *, kernel='Matern32', num_inducing_features: int | list | None = None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, domain_size: float | List[float] | None = None, expert_loc=None, **kwargs)
- Parameters:
- data
- coords_col
- obs_col
- coords
- obs
- coords_scale
- obs_scale
- obs_mean
- kernel: str
-
We have only implemented the case where the same kernel is used per dimension. This is to be extended in the future.
- kernel_kwargs: dict | list of dict, optional
If given as a single
dict
, it passes the same keyword arguments to the kernel in each dimension. If given as alist
, thei
’th entry corresponds to the keyword arguments passed to the kernel in dimensioni
.- num_inducing_features: int | list of int
The number of Fourier features in each dimension. If given as a
list
, the length must be equal to the input dimensions i.e. the length ofself.coords_col
(seeBaseGPRModel
) and the entries correspond to the number of inducing features in each dimension. If given asint
, the same number of inducing features are set per input dimension.- domain_size: float | list of float, optional
The (unscaled) size of the fininte domain where VFF is defined. If given as a
list
, this defines a cuboidal domain centered atexpert_loc
with size2 * domain_size[i]
in each dimensioni
. If given as afloat
, this defines a cubic domain with size2 * domain_size
in each dimension.- expert_loc: np.array, optional
The center of the cuboidal domain where Fourier basis is defined.
- get_kernel_variance()
Returns the kernel variance hyperparameter.
- get_lengthscales()
Returns the lengthscale kernel hyperparameters.
- get_objective_function_value()
Get the ELBO value for current state.
- set_kernel_variance(kernel_variance)
Setter method for kernel variance.
- Parameters:
- kernel_variance: float
We assign equal variance to each 1D kernel such that they multiply to
kernel_variance
.
- set_kernel_variance_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)
Sets constraints on the kernel variance.
- Parameters:
- low: int | float
Minimal value for kernel variance.
- high: list | int | float
Maximal value for kernel variance.
- move_within_tol: bool, default True
If
True
, ensures that current hyperparam values are within the interval [low+tol, high-tol] fortol
given below.- tol: float, default 1e-8
The tol value for when
move_within_tol = True
.- scale: bool, default False
If
True
, thelow
andhigh
values are set with respect to the untransformed coord values. IfFalse
, they are set with respect to the transformed values.- scale_magnitude: int or float, optional
The value with which one rescales the coord values if
scale = True
. IfNone
, it will transform byself.coords_scale
(seeBaseGPRModel
attributes).
- set_lengthscales(lengthscales)
Setter method for kernel lengthscales.
- Parameters:
- lengthscales: numpy array | tensorflow tensor | list of int or float | int | float
Tensor-like data of size D (input dimensions) specifying the lengthscales in each dimension. If specified as an int or a float, it will assign the same lengthscale in each dimension.
- set_lengthscales_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)
Sets constraints on the lengthscale hyperparameters.
- Parameters:
- low: list | int | float
Minimal value for lengthscales. If specified as a
list
type, it should have length D (coordinate dimension) where the entries correspond to minimal values of the lengthscale in each dimension in the order given byself.coords_col
(seeBaseGPRModel
attributes). Ifint
orfloat
, the same minimal values are assigned to each dimension.- high: list | int | float
Same as above, except specifying the maximal values.
- move_within_tol: bool, default True
If
True
, ensures that current hyperparam values are within the interval [low+tol, high-tol] fortol
given below.- tol: float, default 1e-8
The tol value for when
move_within_tol = True
.- scale: bool, default False
If
True
, thelow
andhigh
values are set with respect to the untransformed coord values. IfFalse
, they are set with respect to the transformed values.- scale_magnitude: int or float, optional
The value with which one rescales the coord values if
scale = True
. IfNone
, it will transform byself.coords_scale
(seeBaseGPRModel
attributes).