{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Modelling with local GP experts (Part II): Using the ``LocalExpertOI`` API\n", "In the previous part of the tutorial, we implemented a local GP expert model to fit on non-stationary data. Here, we will do the same except using ``GPSat``'s ``LocalExpertOI`` class, which automates some of the procedures involved making experiments less cumbersome." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-08-03 22:09:45.761071: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: SSE4.1 SSE4.2, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.\n" ] } ], "source": [ "import scipy\n", "import os\n", "import GPSat\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from GPSat import get_parent_path\n", "from GPSat.postprocessing import glue_local_predictions_1d\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We generate the same data as before:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Set random seed\n", "np.random.seed(0)\n", "\n", "# Generate data\n", "N = 100\n", "noise_std = 0.05\n", "\n", "X_grid = np.linspace(0.1, 0.6, 100)\n", "X = np.random.uniform(0.1, 0.6, (N,))\n", "f = lambda x: np.sin(1/x)\n", "epsilon = noise_std * np.random.randn(N)\n", "\n", "y = f(X) + epsilon\n", "f_truth = f(X_grid) # Ground truth\n", "\n", "# Plot\n", "plt.plot(X_grid, f_truth, 'k', zorder=1, label='Ground truth')\n", "plt.scatter(X, y, color='C3', alpha=0.6, zorder=2, label='Noisy observations')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configuration dataclasses\n", "\n", "We will now conduct the same experiments as in the previous tutorial using ``GPSat.local_experts.LocalExpertOI``.\n", "\n", "First, we break down a single experiment into the following four key components:\n", "\n", "1. The local expert locations\n", "2. The GP model assigned to each local expert\n", "3. The training data\n", "4. The points where we want to make predictions\n", "\n", "In ``GPSat``, we configure each of these four components with a so-called *configuration dataclass*. The goal is to allow sufficient modelling flexibility to accomodate various problems and datasets.\n", "\n", "### 1. Local expert config\n", "We start by setting the configuration for the local expert locations. This can be done by assigning a dataframe containing the locations of the local experts.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from GPSat.config_dataclasses import DataConfig, ModelConfig, PredictionLocsConfig, ExpertLocsConfig\n", "\n", "# Construct a data frame containing the two local expert locations\n", "xpert_loc_1 = 0.25\n", "xpert_loc_2 = 0.45\n", "xpert_locs_df = pd.DataFrame({'x': [xpert_loc_1, xpert_loc_2]})\n", "\n", "# Set up an expert location configuration dataclass\n", "expert_loc_config = ExpertLocsConfig(source=xpert_locs_df)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``source`` argument is where we point to the expert locations. In this case, we simply used a dataframe to represent the expert locations and pointed to that. However in more advanced applications, we also have the functionality to instead point to a file where the expert locations are saved, which can be more convenient.\n", "\n", "### 2. Model config\n", "Next, we set up the configuration for the model assigned to each local expert. Here, we will use the ``sklearnGPRModel``, which we specify as follows:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Set up configuration for the model\n", "model_config = ModelConfig(oi_model=\"sklearnGPRModel\",\n", " init_params={\"likelihood_variance\": noise_std**2,\n", " \"kernel\": 'RBF',\n", " \"verbose\": False}\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We specified the model we are using in ``oi_model`` (pre-implemented ``GPSat`` models can be referred to by strings), and in ``init_params``, we pass any arguments used to initialise the model (expressed as a dictionary). Note that we *do not need* to specify arguments to set the data here (namely ``data``, ``coords`` and ``obs``) as this will be done automatically in the main loop.\n", "\n", "There are also functionalities to specify constraints on parameters, re-scale the data, etc... however, we will ignore these for the sake of keeping the presentation simple.\n", "\n", "### 3. Data config\n", "Next we set up the configuration for data. Here, we configure information such as the source of data and instructions on how to assign a subset of the data to each local expert.\n", "\n", "First, we put our training data into a pandas dataframe." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xy
00.3744070.395253
10.4575950.862078
20.4013820.628628
30.3724420.364094
40.3118270.009149
\n", "
" ], "text/plain": [ " x y\n", "0 0.374407 0.395253\n", "1 0.457595 0.862078\n", "2 0.401382 0.628628\n", "3 0.372442 0.364094\n", "4 0.311827 0.009149" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Write data as dataframe\n", "data_df = pd.DataFrame({'x': X, 'y': y})\n", "\n", "data_df.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the local data selection, we want to select data points within ± the training radius from the expert locations.\n", "\n", "In ``GPSat``, we have a unique API to select data from simple instructions. These instructions are expressed in a dictionary with the keys ``\"col\"``, ``\"comp\"`` and ``\"val\"``. For example, see below for the instructions to select data within ± the inference radius of some reference point." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Set inference radius\n", "training_radius = 0.15\n", "\n", "local_select_instructions = [{\"col\": \"x\", \"comp\": \"<=\", \"val\": training_radius},\n", " {\"col\": \"x\", \"comp\": \">=\", \"val\": -training_radius}]\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first argument ``\"col\"`` indicates which column in the dataframe we want to impose conditions on (in this case ``\"x\"``), the ``\"comp\"`` arguments specifies a relation such as \"greater than\", \"less than\", etc... and the ``\"val\"`` argument specifies the value with which we want to compare our column with (in our case, the training radius).\n", "\n", "Thus programmatically, the above list of commands will select data as follows:\n", "\n", "```\n", ">>> data_1 = data_df[ (data_df[\"x\"] - ref_point) <= training_radius ]\n", ">>> data_2 = data_df[ (data_df[\"x\"] - ref_point) >= -training_radius ]\n", ">>> local_data = union(data_1, data_2)\n", "```\n", "\n", "Here, ``ref_point`` is some reference point, which, in the main loop, will correspond to the expert locations. The command ``union`` is a pseudo-function to take the intersection of members in ``data_1`` and ``data_2``.\n", "\n", "With this data selection instruction specified, we can now set the configuration for data as follows." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Set data config\n", "data_config = DataConfig(data_source = data_df,\n", " obs_col = [\"y\"],\n", " coords_col = [\"x\"],\n", " local_select = local_select_instructions\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The argument ``data_source`` points to the dataframe where our data is stored, ``obs_col`` specifies the column in our dataframe corresponding to the measurements, ``coords_col`` specifies the column corresponding to the input coordinates, and ``local_select`` is where we put our instructions for local data selection.\n", "\n", "### 4. Prediction location config\n", "Finally, we configure the prediction locations. This should include information about the test locations and the local inference region, where the local experts make predictions. The inference region is simply set to be a circular region around the expert location, with radius given by the inference radius.\n", "\n", "First, we write the prediction locations into a dataframe." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x
00.100000
10.105051
20.110101
30.115152
40.120202
\n", "
" ], "text/plain": [ " x\n", "0 0.100000\n", "1 0.105051\n", "2 0.110101\n", "3 0.115152\n", "4 0.120202" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set up prediction locations as a dataframe\n", "prediction_locs = X_grid\n", "prediction_locs_df = pd.DataFrame({'x': X_grid})\n", "\n", "prediction_locs_df.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now set the configuration for prediction locations. We take the inference radius to be slightly larger than the training radius to include predictions on the boundaries." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "inference_radius = training_radius + 1e-8\n", "\n", "pred_loc_config = PredictionLocsConfig(method = \"from_dataframe\",\n", " df = prediction_locs_df,\n", " max_dist = inference_radius\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the ``method`` argument specifies how the prediction locations are selected. In our case this is ``from_dataframe`` and we specify the dataframe in the argument ``df``. The ``max_dist`` argument specifies the inference radius around the expert location.\n", "\n", "## Run experiment\n", "We are now in shape to run our experiment. To do this, we initialise a ``LocalExpertOI`` object from the four config classes we created." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'data_select': 0.001 seconds\n", "'load': 0.002 seconds\n" ] } ], "source": [ "from GPSat.local_experts import LocalExpertOI\n", "\n", "# Set up local expert experiment\n", "locexp = LocalExpertOI(data_config = data_config,\n", " model_config = model_config,\n", " expert_loc_config = expert_loc_config,\n", " pred_loc_config = pred_loc_config)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we just need to specify a path where we want to store our results and run an experiment with the ``run()`` method. The stored path should be a HDF5 file, which uses the extension `\".h5\"`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---------\n", "storing expert locations in 'expert_locs' table\n", "exception occurred: 'No object named expert_locs in the file'\n", "will now close object\n", "\n", "\u001b[96m---------\n", "dropping expert locations that already exists in 'run_details' table\u001b[0m\n", "exception occurred: 'No object named run_details in the file'\n", "will now close object\n", "\n", "------------------------------\n", "1 / 2\n", " x\n", "0 0.25\n", "'_max_dist_bool': 0.000 seconds\n", "'_from_dataframe': 0.001 seconds\n", "'data_select': 0.000 seconds\n", "'load': 0.000 seconds\n", "'_update_global_data': 0.000 seconds\n", "'local_data_select': 0.001 seconds\n", "number obs: 62\n", "'__init__': 0.037 seconds\n", "'get_parameters': 0.000 seconds\n", "'optimise_parameters': 0.066 seconds\n", "'get_objective_function_value': 0.000 seconds\n", "'get_parameters': 0.000 seconds\n", "parameters:\n", "lengthscales: 0.0321035488284147\n", "kernel_variance: 0.6388633229513591\n", "likelihood_variance: 0.0025000000000000005\n", "'predict': 0.001 seconds\n", "total run time : 0.39 seconds\n", "------------------------------\n", "2 / 2\n", " x\n", "1 0.45\n", "'_max_dist_bool': 0.000 seconds\n", "'_from_dataframe': 0.000 seconds\n", "'_update_global_data': 0.000 seconds\n", "'local_data_select': 0.001 seconds\n", "number obs: 59\n", "'__init__': 0.020 seconds\n", "'optimise_parameters': 0.041 seconds\n", "'get_objective_function_value': 0.000 seconds\n", "'get_parameters': 0.000 seconds\n", "parameters:\n", "lengthscales: 0.16317534178011256\n", "kernel_variance: 0.3296724457270036\n", "likelihood_variance: 0.0025000000000000005\n", "'predict': 0.000 seconds\n", "total run time : 0.22 seconds\n", "storing any remaining tables\n", "SAVING RESULTS\n", "run_details\n", "preds\n", "lengthscales\n", "kernel_variance\n", "likelihood_variance\n", "'run': 1.058 seconds\n" ] } ], "source": [ "# path to store results\n", "store_path = get_parent_path(\"results\", \"1d_tutorial_example.h5\")\n", "\n", "# for the purposes of a simple example, if store_path exists: delete it\n", "if os.path.exists(store_path):\n", " os.remove(store_path)\n", " \n", "# run local expert optimal interpolation\n", "locexp.run(store_path=store_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can extract the results from the HDF5 with the ``local_experts.get_results_from_h5file()`` method." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reading in results\n", "getting all tables\n", "'data_select': 0.001 seconds\n", "'load': 0.001 seconds\n", "merging on expert location data\n", "table: 'oi_config' does not have all coords_col: ['x'] in columns, not merging on expert_locations\n" ] } ], "source": [ "# extract, store in dict\n", "dfs, _ = GPSat.local_experts.get_results_from_h5file(store_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the results that are stored by accessing the keys of the dictionary ``dfs``:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['kernel_variance', 'lengthscales', 'likelihood_variance', 'oi_config', 'preds', 'run_details'])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see tables storing the model parameters (``'kernel_variance'``, ``'lengthscales'``, ``'likelihood_variance'``), the full configuration used to run the experiment stored in json format (``'oi_config'``), model predictions (``'preds'``) and details of the experiment run such as run time, device name, etc... (``'run_details'``).\n", "\n", "Let's check the ``'preds'`` table storing the model predictions." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x_dim_0f*f*_varpred_loc_x
00.250-0.5014230.0032680.100000
10.251-0.0801150.0009200.105051
20.2520.3106460.0005720.110101
30.2530.6351140.0006980.115152
40.2540.8665150.0007600.120202
\n", "
" ], "text/plain": [ " x _dim_0 f* f*_var pred_loc_x\n", "0 0.25 0 -0.501423 0.003268 0.100000\n", "1 0.25 1 -0.080115 0.000920 0.105051\n", "2 0.25 2 0.310646 0.000572 0.110101\n", "3 0.25 3 0.635114 0.000698 0.115152\n", "4 0.25 4 0.866515 0.000760 0.120202" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs['preds'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As in the previous tutorial, we can glue overlapping predictions from different experts by running the ``glue_local_predictions_1d()`` method." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pred_loc_xf*f*_var
00.100000-0.5014230.003268
10.105051-0.0801150.000920
20.1101010.3106460.000572
30.1151520.6351140.000698
40.1202020.8665150.000760
\n", "
" ], "text/plain": [ " pred_loc_x f* f*_var\n", "0 0.100000 -0.501423 0.003268\n", "1 0.105051 -0.080115 0.000920\n", "2 0.110101 0.310646 0.000572\n", "3 0.115152 0.635114 0.000698\n", "4 0.120202 0.866515 0.000760" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "glued_preds = glue_local_predictions_1d(preds_df = dfs['preds'],\n", " pred_loc_col = 'pred_loc_x',\n", " xprt_loc_col = 'x',\n", " vars_to_glue = ['f*', 'f*_var'],\n", " inference_radius = inference_radius)\n", "glued_preds.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We plot the results below" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Extract glued mean and variance predictions\n", "f_mean = glued_preds['f*']\n", "f_var = glued_preds['f*_var']\n", "f_std = np.sqrt(f_var)\n", "X_test = glued_preds['pred_loc_x']\n", "\n", "# Plot results\n", "plt.plot(X_grid, f_truth, 'k', zorder=0, label='Ground truth')\n", "plt.plot(X_test, f_mean, color='C3', zorder=1, label='Glued predictions (2 experts)')\n", "plt.fill_between(X_test, f_mean-1.96*f_std, f_mean+1.96*f_std, color='C3', alpha=0.3)\n", "\n", "xvals = [0.25, 0.45]\n", "yvals = [-1.2, -1.]\n", "plt.errorbar(xvals, yvals, xerr=0.15, fmt='o', elinewidth=2, barsabove=True, capsize=5, alpha=0.5, label='Local expert locations')\n", "for (x, y) in zip(xvals, yvals):\n", " plt.vlines(x, -1.4, y, linestyles='dashed')\n", "ax = plt.gca()\n", "ax.set_ylim([-1.4, 1.1])\n", "\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above, we have also illustrated the local expert locations (blue circle) and the inference regions (blue horizontal bars).\n", "\n", "Below, we assess the performance of the model." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean squared error: 0.0005\n", "Mean log likelihood: 2.5734\n" ] } ], "source": [ "print(f\"Mean squared error: {np.mean((f_truth - f_mean)**2):.4f}\")\n", "print(f\"Mean log likelihood: {scipy.stats.norm.logpdf(f_truth, f_mean, f_std).mean():.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using more local experts\n", "Finally, let's see what happens when we double the number of local experts. Below, we set up the configurations for an experiment using the expert locations at x = [0.2, 0.3, 0.4, 0.5] and training radius = 0.1." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Set new expert locations\n", "xprt_locs = [0.2, 0.3, 0.4, 0.5]\n", "\n", "# Set training and inference radii\n", "training_radius = 0.1\n", "inference_radius = training_radius + 1e-8\n", "\n", "# Set up configs\n", "expert_loc_config = ExpertLocsConfig(source=pd.DataFrame({'x': xprt_locs}))\n", "\n", "model_config = ModelConfig(oi_model=\"sklearnGPRModel\",\n", " init_params={\"likelihood_variance\": noise_std**2,\n", " \"kernel\": 'RBF',\n", " \"verbose\": False}\n", " )\n", "\n", "data_config = DataConfig(data_source=data_df,\n", " obs_col=[\"y\"],\n", " coords_col=[\"x\"],\n", " local_select=[{\"col\": \"x\", \"comp\": \"<=\", \"val\": training_radius},\n", " {\"col\": \"x\", \"comp\": \">=\", \"val\": -training_radius}]\n", " )\n", "\n", "pred_loc_config = PredictionLocsConfig(method=\"from_dataframe\",\n", " df=prediction_locs_df,\n", " max_dist=inference_radius\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We run this experiment below using ``LocalExpertOI``." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'data_select': 0.001 seconds\n", "'load': 0.002 seconds\n", "---------\n", "storing expert locations in 'expert_locs' table\n", "exception occurred: 'No object named expert_locs in the file'\n", "will now close object\n", "\n", "\u001b[96m---------\n", "dropping expert locations that already exists in 'run_details' table\u001b[0m\n", "exception occurred: 'No object named run_details in the file'\n", "will now close object\n", "\n", "------------------------------\n", "1 / 4\n", " x\n", "0 0.2\n", "'_max_dist_bool': 0.000 seconds\n", "'_from_dataframe': 0.000 seconds\n", "'data_select': 0.000 seconds\n", "'load': 0.000 seconds\n", "'_update_global_data': 0.000 seconds\n", "'local_data_select': 0.001 seconds\n", "number obs: 41\n", "'__init__': 0.023 seconds\n", "'get_parameters': 0.000 seconds\n", "'optimise_parameters': 0.172 seconds\n", "'get_objective_function_value': 0.000 seconds\n", "'get_parameters': 0.000 seconds\n", "parameters:\n", "lengthscales: 0.03354575999266631\n", "kernel_variance: 1.536162773874243\n", "likelihood_variance: 0.0025000000000000005\n", "'predict': 0.023 seconds\n", "SAVING RESULTS\n", "run_details\n", "preds\n", "lengthscales\n", "kernel_variance\n", "likelihood_variance\n", "total run time : 1.07 seconds\n", "------------------------------\n", "2 / 4\n", " x\n", "1 0.3\n", "'_max_dist_bool': 0.000 seconds\n", "'_from_dataframe': 0.000 seconds\n", "'_update_global_data': 0.000 seconds\n", "'local_data_select': 0.001 seconds\n", "number obs: 37\n", "'__init__': 0.039 seconds\n", "'optimise_parameters': 0.059 seconds\n", "'get_objective_function_value': 0.000 seconds\n", "'get_parameters': 0.000 seconds\n", "parameters:\n", "lengthscales: 0.0887015158885798\n", "kernel_variance: 0.19701393313984264\n", "likelihood_variance: 0.0025000000000000005\n", "'predict': 0.001 seconds\n", "SAVING RESULTS\n", "run_details\n", "preds\n", "lengthscales\n", "kernel_variance\n", "likelihood_variance\n", "total run time : 0.43 seconds\n", "------------------------------\n", "3 / 4\n", " x\n", "2 0.4\n", "'_max_dist_bool': 0.000 seconds\n", "'_from_dataframe': 0.000 seconds\n", "'_update_global_data': 0.000 seconds\n", "'local_data_select': 0.001 seconds\n", "number obs: 44\n", "'__init__': 0.037 seconds\n", "'optimise_parameters': 0.054 seconds\n", "'get_objective_function_value': 0.000 seconds\n", "'get_parameters': 0.000 seconds\n", "parameters:\n", "lengthscales: 0.1793349088554155\n", "kernel_variance: 0.29543246711988835\n", "likelihood_variance: 0.0025000000000000005\n", "'predict': 0.001 seconds\n", "SAVING RESULTS\n", "run_details\n", "preds\n", "lengthscales\n", "kernel_variance\n", "likelihood_variance\n", "total run time : 0.41 seconds\n", "------------------------------\n", "4 / 4\n", " x\n", "3 0.5\n", "'_max_dist_bool': 0.000 seconds\n", "'_from_dataframe': 0.000 seconds\n", "'_update_global_data': 0.000 seconds\n", "'local_data_select': 0.001 seconds\n", "number obs: 38\n", "'__init__': 0.021 seconds\n", "'optimise_parameters': 0.031 seconds\n", "'get_objective_function_value': 0.000 seconds\n", "'get_parameters': 0.000 seconds\n", "parameters:\n", "lengthscales: 0.2911974656858733\n", "kernel_variance: 0.23445334006310356\n", "likelihood_variance: 0.0025000000000000005\n", "'predict': 0.000 seconds\n", "SAVING RESULTS\n", "run_details\n", "preds\n", "lengthscales\n", "kernel_variance\n", "likelihood_variance\n", "total run time : 0.28 seconds\n", "'run': 2.582 seconds\n" ] } ], "source": [ "# Set up local expert experiment\n", "locexp = LocalExpertOI(data_config = data_config,\n", " model_config = model_config,\n", " expert_loc_config = expert_loc_config,\n", " pred_loc_config = pred_loc_config)\n", "\n", "# path to store results\n", "store_path = get_parent_path(\"results\", \"1d_tutorial_example.h5\")\n", "\n", "# for the purposes of a simple example, if store_path exists: delete it\n", "if os.path.exists(store_path):\n", " os.remove(store_path)\n", " \n", "# run local expert optimal interpolation\n", "locexp.run(store_path=store_path, store_every=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot results:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reading in results\n", "getting all tables\n", "'data_select': 0.001 seconds\n", "'load': 0.001 seconds\n", "merging on expert location data\n", "table: 'oi_config' does not have all coords_col: ['x'] in columns, not merging on expert_locations\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# extract, store in dict\n", "dfs, _ = GPSat.local_experts.get_results_from_h5file(store_path)\n", "\n", "glued_preds = glue_local_predictions_1d(preds_df = dfs['preds'],\n", " pred_loc_col = 'pred_loc_x',\n", " xprt_loc_col = 'x',\n", " vars_to_glue = ['f*', 'f*_var'],\n", " inference_radius = inference_radius)\n", "\n", "# Extract glued mean and variance predictions\n", "f_mean = glued_preds['f*']\n", "f_var = glued_preds['f*_var']\n", "f_std = np.sqrt(f_var)\n", "X_test = glued_preds['pred_loc_x']\n", "\n", "# Plot results\n", "plt.plot(X_grid, f_truth, 'k', zorder=0, label='Ground truth')\n", "plt.plot(X_test, f_mean, color='C4', zorder=1, label='Glued predictions (4 experts)')\n", "plt.fill_between(X_test, f_mean-1.96*f_std, f_mean+1.96*f_std, color='C4', alpha=0.3)\n", "\n", "xvals = [0.2, 0.3, 0.4, 0.5]\n", "yvals = [-1.3, -1.2, -1.1, -1.]\n", "plt.errorbar(xvals, yvals, xerr=0.1, fmt='o', elinewidth=2, barsabove=True, capsize=5, alpha=0.5, label='Local expert locations')\n", "for (x, y) in zip(xvals, yvals):\n", " plt.vlines(x, -1.4, y, linestyles='dashed')\n", "ax = plt.gca()\n", "ax.set_ylim([-1.4, 1.1])\n", "\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the results look much better and this is also reflected in the metrics:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean squared error: 0.0003\n", "Mean log likelihood: 2.7179\n" ] } ], "source": [ "print(f\"Mean squared error: {np.mean((f_truth - f_mean)**2):.4f}\")\n", "print(f\"Mean log likelihood: {scipy.stats.norm.logpdf(f_truth, f_mean, f_std).mean():.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** To achieve the best performance using local experts model, each local experts should have sufficiently many data points to prevent overfitting on a particular region. However, if this happens, we can prevent this by *hyperparameter smoothing*.\n", "\n", "**Note:** In ``GPSat``, we have not yet considered learning the optimal distribution of expert locations and the corresponding inference/training radii that best fit the data. We typically assume the expert locations to be distributed on an even grid and use the same inference/training at every expert locations. However it might be interesting in the future to consider the learning of such hyperparameters to further improve performance.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.17 ('gpsat2')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.17" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "42c89ee418f45ab16d4cd7d85b9f5fd46783f67990f590db7ef8d9e48f3f848d" } } }, "nbformat": 4, "nbformat_minor": 2 }