GPflow models

GPSat models based on the GPflow python package for GPU enhanced GP modelling.

class GPSat.models.gpflow_models.GPflowGPRModel(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)

Bases: BaseGPRModel

Model based on the GPflow implementation of exact Gaussian process regression (GPR).

See BaseGPRModel for a complete list of attributes and methods.

__init__(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)

Parameters:

data: See BaseGPRModel.__init__()
coords_col: See BaseGPRModel.__init__()
obs_col: See BaseGPRModel.__init__()
coords: See BaseGPRModel.__init__()
obs: See BaseGPRModel.__init__()
coords_scale: See BaseGPRModel.__init__()
obs_scale: See BaseGPRModel.__init__()
obs_mean: See BaseGPRModel.__init__()
verbose: See BaseGPRModel.__init__()
kernel: str | gpflow.kernels, default “Matern32”: The kernel used for GPR. We can use the following GPflow kernels, which can be passed as a string: “Cosine”, “Exponential”, “Matern12”, “Matern32”, “Matern52”, “RationalQuadratic” or “RBF” (equivalently “SquaredExponential”).
kernel_kwargs: dict, optional: Keyword arguments to be passed to the GPflow kernel specified in kernel.
mean_function: str | gpflow.mean_functions, optional: GPflow mean function to model the prior mean.
mean_func_kwargs: dict, optional: Keyword arguments to be passed to the GPflow mean function specified in mean_function.
noise_variance: float, optional: Variance of Gaussian likelihood. Unnecessary if likelihood is specified explicitly.
likelihood: gpflow.likelihoods.Gaussian, optional: GPflow model for Gaussian likelihood used to model data uncertainty. Can use custom GPflow Gaussian likelihood class here. Unnecessary if using a vanilla Gaussian likelihood and noise_variance is specified.

get_kernel_variance() → float: Returns the kernel variance hyperparameter.

get_lengthscales() → ndarray: Returns the lengthscale kernel hyperparameters.

get_likelihood_variance() → float: Returns the likelihood variance hyperparameter.

get_objective_function_value(): Get the negative marginal log-likelihood loss.

optimise_parameters(max_iter=10000, fixed_params=None, **opt_kwargs)

Method to optimise the kernel hyperparameters using a scipy optimizer (method = L-BFGS-B by default).

Parameters:

max_iter: int, default 10000: The maximum number of interations permitted for optimisation. The optimiser runs either until convergence or until the number of iterations reach max_iter.
fixed_params: list of str, default []: Parameters to fix during optimisation. Should be one of “lengthscales”, “kernel_variance” and “likelihood_variance”.
opt_kwargs: dict, optional: Keyword arguments passed to gpflow.optimizers.Scipy.minimize().

Returns:

bool: Indication of whether optimisation was successful or not, i.e. converges within the maximum number of iterations set.

property param_names: list: Returns the model hyperparameter names: “lengthscales”, “kernel_variance” and “likelihood_variance”.

predict(coords, full_cov=False, apply_scale=True) → Dict[str, ndarray]

Method to generate prediction at given coords.

Parameters:

coords: pandas series | pandas dataframe | list | numpy array: Coordinate locations where we want to make predictions.
full_cov: bool, default False: Flag to determine whether to return a full covariance matrix at the prediction coords or just the marginal variances.
apply_scale: bool, default True: If True, coords should be the raw, untransformed values. If False, coords must be rescaled by self.coords_scale. (see BaseGPRModel attributes).

Returns:

dict of numpy arrays

If full_cov = False, returns a dictionary containing the posterior mean “f*”, posterior variance “f*_var” and predictive variance “y_var” (i.e. the posterior variance + likelihood variance).
If full_cov = True, returns a dictionary containing the posterior mean “f*”, posterior marginal variance “f*_var”, predictive marginal variance “y_var”, full posterior covariance “f*_cov” and full predictive covariance “y_cov”.

set_kernel_variance(kernel_variance)

Setter method for kernel variance.

Parameters:

kernel_variance: int | float | numpy array | tensorflow tensor | list of int or float: int, float or Tensor-like data of size 1 specifying the kernel variance.

set_kernel_variance_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)

Sets constraints on the kernel variance.

Parameters:

low: int | float: Minimal value for kernel variance.
high: list | int | float: Maximal value for kernel variance.
move_within_tol: bool, default True: If True, ensures that current hyperparam values are within the interval [low+tol, high-tol] for tol given below.
tol: float, default 1e-8: The tol value for when move_within_tol = True.
scale: bool, default False: If True, the low and high values are set with respect to the untransformed coord values. If False, they are set with respect to the transformed values.
scale_magnitude: int or float, optional: The value with which one rescales the coord values if scale = True. If None, it will transform by self.coords_scale (see BaseGPRModel attributes).

set_lengthscales(lengthscales)

Setter method for kernel lengthscales.

Parameters:

lengthscales: numpy array | tensorflow tensor | list of int or float | int | float: Tensor-like data of size D (input dimensions) specifying the lengthscales in each dimension. If specified as an int or a float, it will assign the same lengthscale in each dimension.

set_lengthscales_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)

Sets constraints on the lengthscale hyperparameters.

Parameters:

low: list | int | float: Minimal value for lengthscales. If specified as a list type, it should have length D (coordinate dimension) where the entries correspond to minimal values of the lengthscale in each dimension in the order given by self.coords_col (see BaseGPRModel attributes). If int or float, the same minimal values are assigned to each dimension.
high: list | int | float: Same as above, except specifying the maximal values.
move_within_tol: bool, default True: If True, ensures that current hyperparam values are within the interval [low+tol, high-tol] for tol given below.
tol: float, default 1e-8: The tol value for when move_within_tol = True.
scale: bool, default False: If True, the low and high values are set with respect to the untransformed coord values. If False, they are set with respect to the transformed values.
scale_magnitude: int or float, optional: The value with which one rescales the coord values if scale = True. If None, it will transform by self.coords_scale (see BaseGPRModel attributes).

set_likelihood_variance(likelihood_variance)

Setter method for likelihood variance.

Parameters:

likelihood_variance: int | float | numpy array | tensorflow tensor | list of int or float: int, float or Tensor-like data of size 1 specifying the likelihood variance.

set_likelihood_variance_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)

Sets constraints on the likelihood variance.

Parameters:

low: int | float: Minimal value for likelihood variance.
high: list | int | float: Maximal value for likelihood variance.
move_within_tol: bool, default True: If True, ensures that current hyperparam values are within the interval [low+tol, high-tol] for tol given below.
tol: float, default 1e-8: The tol value for when move_within_tol=True.
scale: bool, default False: If True, the low and high values are set with respect to the untransformed coord values. If False, they are set with respect to the transformed values.
scale_magnitude: int or float, optional: The value with which one rescales the coord values if scale=True. If None, it will transform by self.coords_scale (see BaseGPRModel attributes).

update_obs_data(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None)

class GPSat.models.gpflow_models.GPflowSGPRModel(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=500, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)

Bases: GPflowGPRModel

Model using sparse GPR method to handle data size beyond capacity for exact GPR. This introduces a set of M pseudo data points referred to as the inducing points, which summarises information contained in the original dataset (see [T’09] for more details).

Choosing a smaller number of inducing points, one is able to handle large data size up to order ~O(1e5). However, the prediction quality may also deteriorate with fewer inducing points so it is necessary to tune the number of inducing points to strike a good balance between efficiency and accuracy.

See BaseGPRModel for a complete list of attributes and methods.

Notes

This is sub-classed from GPflowGPRModel and uses the same predict() method.
Has O(NM^2) computational complexity and O(NM) memory scaling.
Several techniques for inducing point selection exists (e.g. see this GPflow tutorial), however we have only implemented the random selection method, where inducing points are initialised as M random sub-samples of the training data.

References

[T’09] Titsias, Michalis. “Variational learning of inducing variables in sparse Gaussian processes.” Artificial intelligence and statistics. PMLR, 2009.

__init__(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=500, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood: Gaussian = None, **kwargs)

Parameters:

data: See BaseGPRModel.__init__()
coords_col: See BaseGPRModel.__init__()
obs_col: See BaseGPRModel.__init__()
coords: See BaseGPRModel.__init__()
obs: See BaseGPRModel.__init__()
coords_scale: See BaseGPRModel.__init__()
obs_scale: See BaseGPRModel.__init__()
obs_mean: See BaseGPRModel.__init__()
verbose: See BaseGPRModel.__init__()
kernel: See GPflowGPRModel.__init__()
kernel_kwargs: See GPflowGPRModel.__init__()
mean_function: See GPflowGPRModel.__init__()
mean_func_kwargs: See GPflowGPRModel.__init__()
noise_variance: See GPflowGPRModel.__init__()
likelihood: See GPflowGPRModel.__init__()
num_inducing_points: int, default 500: The number of inducing points.

get_inducing_points() → ndarray: Get the inducing point locations.

get_objective_function_value(): Get the ELBO value for current state.

optimise_parameters(train_inducing_points=False, max_iter=10000, fixed_params=[], **opt_kwargs)

Method to optimise the model parameters (kernel hyperparmeters + inducing point locations) using a scipy optimizer (method = L-BFGS-B by default).

Parameters:

train_inducing_points: bool, default False: Flag to specify whether to optimise the inducing point locations or not. Setting this to True may improve results, however may also lead to slower convergence.
max_iter: int, default 10000: The maximum number of interations permitted for optimisation. The optimiser runs either until convergence or until the number of iterations reach max_iter.
fixed_params: list of str, default []: Parameters to fix during optimisation. Should be one of “lengthscales”, “kernel_variance” and “likelihood_variance”.
opt_kwargs: dict, optional: Keyword arguments passed to gpflow.optimizers.Scipy.minimize().

Returns:

bool: Indication of whether optimisation was successful or not, i.e. converges within the maximum number of iterations set.

property param_names: list: Returns a list of model hyperparameter names (“lengthscales”, “kernel_variance” and “likelihood_variance”), in addition to “inducing points”.

set_inducing_points(inducing_points)

Setter method for inducing point locations.

Parameters:

inducing_points: np.ndarray: Inducing point locations specified as a numpy array of size [M, D].

class GPSat.models.gpflow_models.GPflowSVGPModel(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=None, minibatch_size=None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood=None, likelihood_kwargs=None, **kwargs)

Bases: GPflowGPRModel

Model using SVGP (Sparse Variational GP [H’13]) to deal with even larger data size (even when compared to SGPR), in addition to handling non-Gaussian likelihoods.

Key differences with SGPR are (1) stochastic optimisation of parameters via mini-batching of training data, and (2) gradient-based optimisation of the variational distribution, parameterised by a mean and cholesky factor of the covariance, as opposed to exact computation. The former allows handling of larger data + inducing point sizes and the latter allows handling of non-Gaussian likelihoods (see [H’13] for more details).

See BaseGPRModel for a complete list of attributes and methods.

Notes

This is sub-classed from GPflowGPRModel and uses the same predict() method.
Introduces an extra hyperparameter minibatch_size to be tuned.
Has O(BM^2 + M^3) computational complexity and O(BM + M^2) memory scaling, where B is the minibatch size.
Saving the variational parameters to the results file may be memory intensive due to the M^2 memory scaling of the cholesky factor. Consider leaving them out of the results file when running experiments (TODO: cross reference ModelConfig).

References

[H’13] Hensman, James, Nicolo Fusi, and Neil D. Lawrence. “Gaussian processes for big data.” arXiv preprint arXiv:1309.6835 (2013).

__init__(data=None, coords_col=None, obs_col=None, coords=None, obs=None, coords_scale=None, obs_scale=None, obs_mean=None, verbose=True, *, kernel='Matern32', num_inducing_points=None, minibatch_size=None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, noise_variance=None, likelihood=None, likelihood_kwargs=None, **kwargs)

Parameters:

data: See BaseGPRModel.__init__()
coords_col: See BaseGPRModel.__init__()
obs_col: See BaseGPRModel.__init__()
coords: See BaseGPRModel.__init__()
obs: See BaseGPRModel.__init__()
coords_scale: See BaseGPRModel.__init__()
obs_scale: See BaseGPRModel.__init__()
obs_mean: See BaseGPRModel.__init__()
kernel: See GPflowGPRModel.__init__()
kernel_kwargs: See GPflowGPRModel.__init__()
mean_function: See GPflowGPRModel.__init__()
mean_func_kwargs: See GPflowGPRModel.__init__()
noise_variance: See GPflowGPRModel.__init__()
likelihood: str or gpflow.likelihoods, optional: A GPflow likelihoods object for modelling the likelihood. This is not necessarily a Gaussian. For available GPflow likelihoods, pass a string (e.g. likelihood = "StudentT"). However if not specified, it will default to a Gaussian likelihood with variance given by noise_variance.
likelihood_kwargs: dict, optional: Keyword arguments passed to likelihood.
num_inducing_points: int, optional: The number of inducing points. If not specified, it will set the inducing points to be the data points, in which case the algorithm becomes equivalent to VGP.
minibatch_size: int, optional: The size of minibatch used for stochastic estimation of the loss function. Using smaller batch sizes will result in increased per-iteration efficiency, however optimisation becomes more noisy. If not specified, it will not apply minibatching.

get_inducing_chol() → ndarray: Get the cholesky factor of the covariace of the variational distribution.

get_inducing_mean() → ndarray: Get the mean of the variational distribution.

get_inducing_points() → ndarray: Get the inducing point locations.

get_objective_function_value(): Get the ELBO averaged over minibatches.

optimise_parameters(train_inducing_points=False, natural_gradients=False, fixed_params=[], gamma=0.1, learning_rate=0.01, max_iter=10000, persistence=100, check_every=10, early_stop=True, verbose=False)

Method to optimise the model parameters (kernel hyperparmeters + variational parameters). We use the Adam optimiser for stochastic optimisation of the model parameters.

Parameters:

train_inducing_points: bool, default False: Flag to determine whether or not to optimise inducing point locations.
natural_gradients: bool, default False: Option to use natural gradients to optimise the variational parameters (inducing mean and cholesky). Previous investigations indicate benefits of using them over using Adam to optimise all parameters. (see more details here)
gamma: float, default 0.1: Step length for natural gradient. When not using minibatches, best to set gamma = 1.0. However, empirically shown to be better using smaller gamma e.g. 0.1 when minibatching.
fixed_params: list of str, default []: Parameters to fix during optimisation. Should be one of “lengthscales”, “kernel_variance”, “likelihood_variance”, “inducing points”, “inducing_mean” and “inducing_chol”.
learning_rate: float, default 1e-2: Learning rate for Adam optimizer.
max_iter: int, default 10000: The maximum number of interations permitted for optimisation. The optimiser runs either until convergence (see discussion on the convergence criterion in Notes below) or until the number of iterations reach max_iter.
early_stop: bool, default True: Flag to set early stopping criterion (see Notes below). If False, it will run until number of iterations reach max_iter, which can be quite slow.
persistence: int, default 100: See Notes below.
check_every: int, default 10: See Notes below.
verbose: bool, default False: Set verbosity of model optimisation. If True, displays the loss every check_every steps.

Returns:

bool: Indication of whether optimisation was successful or not.

Notes

Since we use stochastic optimisation, traditional convergence criterion to stop early does not apply here. We instead devise a stopping criterion as follows:

Check the ELBO every check_every iterations.
If the ELBO does not improve after persistence iterations, stop optimisation.

This stopping criterion will be enabled if early_stop is set to True.

property param_names: list

Returns a list of model hyperparameter names (“lengthscales”, “kernel_variance” and “likelihood_variance”), in addition to the variational hyperparameters (“inducing points”, “inducing_mean” and “inducing_chol”).

The “inducing_mean” and “inducing_chol” are respectively, the mean and cholesky factor of the covariance of the Gaussian variational distribution used to approximate the true posterior distribution.

set_inducing_chol(q_sqrt)

Setter method for the inducing cholesky factor.

Parameters:

q_sqrt: np.ndarray: Inducing cholesky values specified as a numpy array of size [1, M, M].

set_inducing_mean(q_mu)

Setter method for the inducing mean.

Parameters:

q_mu: np.ndarray: Inducing mean values specified as a numpy array of size [M, 1].

set_inducing_points(inducing_points)

Setter method for inducing point locations.

Parameters:

inducing_points: np.ndarray: Inducing point locations specified as a numpy array of size [M, D], where D is the input dimension size.

class GPSat.models.vff_model.GPflowVFFModel(data=None, coords_col=None, obs_col=None, coords=None, coords_scale=None, obs=None, obs_scale=None, obs_mean=None, *, kernel='Matern32', num_inducing_features: int | list | None = None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, domain_size: float | List[float] | None = None, expert_loc=None, **kwargs)

Bases: GPflowGPRModel

GPSat model using VFF (variational Fourier features) to handle large data size in low dimensions.

This is a prime example of the interdomain approach where pseudo data points (called inducing features) are placed on a transformed domain instead of the physical domain. For VFF, these inducing features are placed in the frequency domain, which can achieve better scaling in the number of data points compared to SGPR, owing to the orthogonality of the sinusoidal basis functions (see [H’17] for more details).

However, VFF requires using a separable kernel in each dimension, resulting in poor scaling in the input dimensions. Thus, benefits are usually seen for lower dimensional problems such as 1D, 2D and possibly 3D in some cases.

See BaseGPRModel for a complete list of attributes and methods.

Notes

This is sub-classed from GPflowGPRModel and uses the same predict() method.
Likewise, it uses the same get_likelihood_variance(), set_likelihood_variance() and set_likelihood_variance_constraints() methods.
We place inducing features in each input dimension and the effective number M of inducing features is the product of the per-dimension number of inducing features.
Has O(NM^2) pre-computation cost, O(M^3) per-iteration complexity and O(NM) memory scaling.
Crucially, VFF is restricted to work in a finite domain. This introduces an extra variable domain_size to be tuned, which can affect performance. As a rule of thumb, the domain_size should be large enough to subsume the training and inference regions, but making it too large can lead to predictions that are overly smooth.

References

[H’17] Hensman, James, Nicolas Durrande, and Arno Solin. “Variational Fourier Features for Gaussian Processes.” J. Mach. Learn. Res. (2017).

__init__(data=None, coords_col=None, obs_col=None, coords=None, coords_scale=None, obs=None, obs_scale=None, obs_mean=None, *, kernel='Matern32', num_inducing_features: int | list | None = None, kernel_kwargs=None, mean_function=None, mean_func_kwargs=None, domain_size: float | List[float] | None = None, expert_loc=None, **kwargs)

Parameters:

data

See BaseGPRModel.__init__()

coords_col

See BaseGPRModel.__init__()

obs_col

See BaseGPRModel.__init__()

coords

See BaseGPRModel.__init__()

obs

See BaseGPRModel.__init__()

coords_scale

See BaseGPRModel.__init__()

obs_scale

See BaseGPRModel.__init__()

obs_mean

See BaseGPRModel.__init__()

kernel: str

See GPflowGPRModel.__init__()

We have only implemented the case where the same kernel is used per dimension. This is to be extended in the future.

kernel_kwargs: dict | list of dict, optional

If given as a single dict, it passes the same keyword arguments to the kernel in each dimension. If given as a list, the i’th entry corresponds to the keyword arguments passed to the kernel in dimension i.

num_inducing_features: int | list of int

The number of Fourier features in each dimension. If given as a list, the length must be equal to the input dimensions i.e. the length of self.coords_col (see BaseGPRModel) and the entries correspond to the number of inducing features in each dimension. If given as int, the same number of inducing features are set per input dimension.

domain_size: float | list of float, optional

The (unscaled) size of the fininte domain where VFF is defined. If given as a list, this defines a cuboidal domain centered at expert_loc with size 2 * domain_size[i] in each dimension i. If given as a float, this defines a cubic domain with size 2 * domain_size in each dimension.

expert_loc: np.array, optional

The center of the cuboidal domain where Fourier basis is defined.

get_kernel_variance(): Returns the kernel variance hyperparameter.

get_lengthscales(): Returns the lengthscale kernel hyperparameters.

get_objective_function_value(): Get the ELBO value for current state.

set_kernel_variance(kernel_variance)

Setter method for kernel variance.

Parameters:

kernel_variance: float: We assign equal variance to each 1D kernel such that they multiply to kernel_variance.

set_kernel_variance_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)

Sets constraints on the kernel variance.

Parameters:

low: int | float: Minimal value for kernel variance.
high: list | int | float: Maximal value for kernel variance.
move_within_tol: bool, default True: If True, ensures that current hyperparam values are within the interval [low+tol, high-tol] for tol given below.
tol: float, default 1e-8: The tol value for when move_within_tol = True.
scale: bool, default False: If True, the low and high values are set with respect to the untransformed coord values. If False, they are set with respect to the transformed values.
scale_magnitude: int or float, optional: The value with which one rescales the coord values if scale = True. If None, it will transform by self.coords_scale (see BaseGPRModel attributes).

set_lengthscales(lengthscales)

Setter method for kernel lengthscales.

Parameters:

lengthscales: numpy array | tensorflow tensor | list of int or float | int | float: Tensor-like data of size D (input dimensions) specifying the lengthscales in each dimension. If specified as an int or a float, it will assign the same lengthscale in each dimension.

set_lengthscales_constraints(low, high, move_within_tol=True, tol=1e-08, scale=False, scale_magnitude=None)

Sets constraints on the lengthscale hyperparameters.

Parameters:

low: list | int | float: Minimal value for lengthscales. If specified as a list type, it should have length D (coordinate dimension) where the entries correspond to minimal values of the lengthscale in each dimension in the order given by self.coords_col (see BaseGPRModel attributes). If int or float, the same minimal values are assigned to each dimension.
high: list | int | float: Same as above, except specifying the maximal values.
move_within_tol: bool, default True: If True, ensures that current hyperparam values are within the interval [low+tol, high-tol] for tol given below.
tol: float, default 1e-8: The tol value for when move_within_tol = True.
scale: bool, default False: If True, the low and high values are set with respect to the untransformed coord values. If False, they are set with respect to the transformed values.
scale_magnitude: int or float, optional: The value with which one rescales the coord values if scale = True. If None, it will transform by self.coords_scale (see BaseGPRModel attributes).