Utils

TODO: Divide utils into specific categories.

GPSat.utils.EASE2toWGS84(x, y, return_vals='both', lon_0=0, lat_0=90)

Converts EASE2 grid coordinates to WGS84 longitude and latitude coordinates.

Parameters:

x: float: EASE2 grid x-coordinate in meters.
y: float: EASE2 grid y-coordinate in meters.
return_vals: str, optional: Determines what values to return. Valid options are "both" (default), "lon", or "lat".
lon_0: float, optional: Longitude of the center of the EASE2 grid in degrees. Default is 0.
lat_0: float, optional: Latitude of the center of the EASE2 grid in degrees. Default is 90.

Returns:

tuple or float: Depending on the value of return_vals, either a tuple of WGS84 longitude and latitude coordinates (both floats), or a single float representing either the longitude or latitude.

Raises:

AssertionError: If return_vals is not one of the valid options.

Examples

>>> EASE2toWGS84(1000000, 2000000)
(153.434948822922, 69.86894542225777)

GPSat.utils.EASE2toWGS84_New(*args, **kwargs)

GPSat.utils.WGS84toEASE2(lon, lat, return_vals='both', lon_0=0, lat_0=90)

Converts WGS84 longitude and latitude coordinates to EASE2 grid coordinates.

Parameters:

lonfloat: Longitude coordinate in decimal degrees.
latfloat: Latitude coordinate in decimal degrees.
return_valsstr, optional: Determines what values to return. Valid options are "both" (default), "x", or "y".
lon_0float, optional: Longitude of the center of the EASE2 grid in decimal degrees. Default is 0.
lat_0float, optional: Latitude of the center of the EASE2 grid in decimal degrees. Default is 90.

Returns:

float: If return_vals is "x". Returns the x EASE2 grid coordinate in meters.
float: If return_vals is "y". Returns the y EASE2 grid coordinate in meters
tuple of float: If return_vals is "both". Returns a tuple of (x, y) EASE2 grid coordinates in meters.

Raises:

AssertionError: If return_vals is not one of the valid options.

Examples

>>> WGS84toEASE2(-105.01621, 39.57422)
(-5254767.014984061, 1409604.1043472202)

GPSat.utils.WGS84toEASE2_New(*args, **kwargs)

GPSat.utils.array_to_dataframe(x, name, dim_prefix='_dim_', reset_index=False)

Converts a numpy array to a pandas DataFrame with a multi-index based on the array’s dimensions.

(Also see dataframe_to_array)

Parameters:

xnp.ndarray: The numpy array to be converted to a DataFrame.
namestr: The name of the column in the resulting DataFrame.
dim_prefixstr, optional: The prefix to be used for the dimension names in the multi-index. Default is "_dim_". Integers will be appended to dim_prefix for each dimension of x, i.e. if x is 2d, it will have dimension names "_dim_0", "_dim_1", assuming default dim_prefix is used.
reset_indexbool, optional: Whether to reset the index of the resulting DataFrame. Default is False.

Returns:

outpd.DataFrame: The resulting DataFrame with a multi-index based on the dimensions of the input array.

Raises:

AssertionError: If the input is not a numpy array.

Examples

>>> # express a 2d numpy array in DataFrame
>>> x = np.array([[1, 2], [3, 4]])
>>> array_to_dataframe(x, "data")
                data
_dim_0 _dim_1
0      0        1
       1        2
1      0        3
       1        4

GPSat.utils.assign_category_col(val, df, categories=None)

Generate categorical pd.Series equal in length to a reference DataFrame (df)

Parameters:

valstr: The value to assign to the categorical Series.
dfpandas DataFrame: reference DataFrame, used to determine length of output
categorieslist, optional: A list of categories to be used for the categorical column.

Returns:

pandas Categorical Series: A categorical column with the assigned value and specified categories (if provided).

Notes

This function creates a new categorical column in the DataFrame with the specified value and categories. If categories are not provided, they will be inferred from the data. The function returns a pandas Categorical object representing the new column.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
>>> x_series = assign_category_col('x', df)

GPSat.utils.bin_obs_by_date(df, val_col, date_col='date', all_dates_in_range=True, x_col='x', y_col='y', grid_res=None, date_col_format='%Y%m%d', x_min=-4500000.0, x_max=4500000.0, y_min=-4500000.0, y_max=4500000.0, n_x=None, n_y=None, bin_statistic='mean', verbose=False)

This function takes in a pandas DataFrame and bins the data based on the values in a specified column and the x and y coordinates in other specified columns. The data is binned based on a grid with a specified resolution or number of bins. The function returns a dictionary of binned values for each unique date in the DataFrame.

Parameters:

df: pandas DataFrame: A DataFrame containing the data to be binned.
val_col: string: Name of the column containing the values to be binned.
date_col: string, default “date”: Name of the column containing the dates for which to bin the data.
all_dates_in_range: boolean, default True: Whether to include all dates in the range of the DataFrame.
x_col: string, default “x”: Name of the column containing the x coordinates.
y_col: string, default “y”: Name of the column containing the y coordinates.
grid_res: float or int, default None: Resolution of the grid in kilometers. If None, then n_x and n_y must be specified.
date_col_format: string, default “%Y%m%d”: Format of the date column.
x_min: float, default -4500000.0: Minimum x value for the grid.
x_max: float, default 4500000.0: Maximum x value for the grid.
y_min: float, default -4500000.0: Minimum y value for the grid.
y_max: float, default 4500000.0: Maximum y value for the grid.
n_x: int, default None: Number of bins in the x direction.
n_y: int, default None: Number of bins in the y direction.
bin_statistic: string or callable, default “mean”: Statistic to compute in each bin.
verbose: boolean, default False: Whether to print additional information during execution.

Returns:

bvals: dictionary: The binned values for each unique date in the DataFrame.
x_edge: numpy array: x values for the edges of the bins.
y_edge: numpy array: y values for the edges of the bins.

Notes

The x and y coordinates are swapped in the returned binned values due to the transpose operation used in the function.

GPSat.utils.check_prev_oi_config(prev_oi_config, oi_config, skip_valid_checks_on=None)

This function checks if the previous configuration matches the current one. It takes in two dictionaries, prev_oi_config and oi_config, which represent the previous and current configurations respectively.

The function also takes an optional list skip_valid_checks_on, which contains keys that should be skipped during the comparison.

Parameters:

prev_oi_config: dict: Previous configuration to be compared against.
oi_config: dict: Current configuration to compare against prev_oi_config.
skip_valid_checks_on: list or None, default None: If not None, should be a list of keys to not check.

Returns:

None

Notes

If skip_valid_checks_on is not provided, it defaults to an empty list. The function then compares the two configurations and raises an AssertionError if any key-value pairs do not match.
If the configurations do not match exactly, an AssertionError is raised.
This function assumes that the configurations are represented as dictionaries and that the keys in both dictionaries are the same.

GPSat.utils.compare_dataframes(df1, df2, merge_on, columns_to_compare, drop_other_cols=False, how='outer', suffixes=['_1', '_2'])

GPSat.utils.config_func(func, source=None, args=None, kwargs=None, col_args=None, col_kwargs=None, df=None, filename_as_arg=False, filename=None, col_numpy=True)

Apply a function based on configuration input.

The aim is to allow one to apply a function, possibly on data from a DataFrame, using a specification that can be stored in a JSON configuration file.

Note

This function uses eval() so could allow for arbitrary code execution.
If DataFrame df is provided, then can provide input (col_args and/or col_kwargs) based on columns of df.

Parameters:

func: str or callable.

If str, it will use eval(func) to convert it to a function.
If it contains one of "|", "&", "=", "+", "-", "*", "/", "%", "<", and ">", it will create a lambda function:

lambda arg1, arg2: eval(f"arg1 {func} arg2")

If eval(func) raises NameError and source is not None, it will run

f"from {source} import {func}"

and try again. This is to allow import function from a source.

source: str or None, default None

Package name where func can be found, if applicable. Used to import func from a package. e.g.

>>> GPSat.utils.config_func(func="cumprod", source="numpy", ...)

calls the function cumprod from the package numpy.

args: list or None, default None

If None, an empty list will be used, i.e. no args will be used. The values will be unpacked and provided to func: i.e. func(*args, **kwargs)

kwargs: dict or None, default None

If dict, it will be unpacked (**kwargs) to provide key word arguments to func.

col_args: None or list of str, default None

If DataFrame df is provided, it can use col_args to specify which columns of df will be passed into func as arguments.

col_kwargs: None or dict, default is None

Keyword arguments to be passed to func specified as dict whose keys are parameters of func and values are column names of a DataFrame df. Only applicable if df is provided.

df: DataFrame or None, default None

To provide if one wishes to use columns of a DataFrame as arguments to func.

filename_as_arg: bool, default False

Set True if filename is used as an argument to func.

filename: str or None, default None

If filename_as_arg is True, then will provide filename as first arg.

col_numpy: bool, default True

If True, when extracting columns from DataFrame, .values is used to convert to numpy array.

Returns:

any: Values returned by applying func on data. The type depends on func.

Raises:

AssertionError: If kwargs is not a dict, col_kwargs is not a dict, or func is not a string or callable.
AssertionError: If df is not provided but col_args or col_kwargs are.
AssertionError: If func is a string and cannot be imported on it’s own and source is None.

Examples

>>> import pandas as pd
>>> from GPSat.utils import config_func
>>> config_func(func="lambda x, y: x + y", args=[1, 1]) # Computes 1 + 1
2
>>> config_func(func="==", args=[1, 1]) # Computes 1 == 1
True

Using columns of a DataFrame as inputs:

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> config_func(func="lambda x, y: x + y", df=df, col_args=["A", "B"]) # Computes df["A"] + df["B"]
array([5, 7, 9])
>>> config_func(func="<=", col_args=["A", "B"], df=df) # Computes df["A"] <= df["B"]
array([ True,  True,  True])

We can also use functions from an external package by specifying source. For example, the below reproduces the last example in numpy.cumprod:

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> config_func(func="cumprod", source="numpy", df=df, kwargs={"axis": 0}, col_args=[["A", "B"]])
array([[  1,   4],
       [  2,  20],
       [  6, 120]])

GPSat.utils.convert_lon_lat_str(x)

Converts a string representation of longitude or latitude to a float value.

Parameters:

x: str: A string representation of longitude or latitude in the format of "[degrees] [minutes] [direction]", where [direction] is one of "N", "S", "E", or "W".

Returns:

float: The converted value of the input string as a float.

Raises:

AssertionError: If the input is not a string.

Examples

>>> convert_lon_lat_str('74 0.1878 N')
74.00313
>>> convert_lon_lat_str('140 0.1198 W')
-140.001997

GPSat.utils.cprint(x, c='ENDC', bcolors=None, sep=' ', end='\n')

Add color to print statements.

Based off of https://stackoverflow.com/questions/287871/how-do-i-print-colored-text-to-the-terminal.

Parameters:

x: str: String to be printed.
c: str, default “ENDC”: Valid key in bcolors. If bcolors is not provided, then default will be used, containing keys: 'HEADER', 'OKBLUE', 'OKCYAN', 'OKGREEN', 'WARNING', 'FAIL', 'ENDC', 'BOLD', 'UNDERLINE'.
bcolors: dict or None, default None: Dict with values being colors / how to format the font. These cane be chained together. See the codes in: https://en.wikipedia.org/wiki/ANSI_escape_code#3-bit_and_4-bit.
sep: str, default “ “: sep argument passed along to print().
end: str, default “\n”: end argument passed along to print().

Returns:

None

GPSat.utils.dataframe_to_2d_array(df, x_col, y_col, val_col, tol=1e-09, fill_val=nan, dtype=None, decimals=1)

Extract values from DataFrame to create a 2-d array of values (val_col) - assuming the values came from a 2-d array. Requires dimension columns x_col, y_col (do not have to be ordered in DataFrame).

Parameters:

df: pandas.DataFrame: The dataframe to convert to a 2D array.
x_col: str: The name of the column in the dataframe that contains the x coordinates.
y_col: str: The name of the column in the dataframe that contains the y coordinates.
val_col: str: The name of the column in the dataframe that contains the values to be placed in the 2D array.
tol: float, default 1e-9: The tolerance for matching the x and y coordinates to the grid.
fill_val: float, default np.nan: The value to fill the 2D array with if a coordinate is missing.
dtype: str or numpy.dtype or None, default None: The data type of the values in the 2D array.
decimals: int, default 1: The number of decimal places to round x and y values to before taking unique. If decimals is negative, it specifies the number of positions to the left of the decimal point.

Returns:

tuple: A tuple containing the 2D numpy array of values, the x coordinates of the grid, and the y coordinates of the grid.

Raises:

AssertionError: If any of the required columns are missing from the dataframe, or if any coordinates have more than one value.

Notes

The spacing of grid is determined by the smallest step size in the x_col, y_col direction, respectively.
This is meant to reverse the process of putting values from a regularly spaced grid into a DataFrame. Do not expect this to work on arbitrary x,y coordinates.

GPSat.utils.dataframe_to_array(df, val_col, idx_col=None, dropna=True, fill_val=nan)

Converts a pandas DataFrame to a numpy array, where the DataFrame has columns that represent dimensions of the array and the DataFrame rows represent values in the array.

Parameters:

dfpandas DataFrame: The DataFrame containing values convert to a numpy ndarray.
val_colstr: The name of the column in the DataFrame that contains the values to be placed in the array.
idx_colstr or list of str or None, default None: The name(s) of the column(s) in the DataFrame that represent the dimensions of the array. If not provided, the index of the DataFrame will be used as the dimension(s).
dropnabool, default True: Whether to drop rows with missing values before converting to the array.
fill_valscalar, default np.nan: The value to fill in the array for missing values.

Returns:

numpy array: The resulting numpy array.

Raises:

AssertionError: If the dimension values are not integers or have gaps, or if the idx_col parameter contains column names that are not in the DataFrame.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from GPSat.utils import dataframe_to_array
>>> df = pd.DataFrame({
...     'dim1': [0, 0, 1, 1],
...     'dim2': [0, 1, 0, 1],
...     'values': [1, 2, 3, 4]
... })
>>> arr = dataframe_to_array(df, 'values', ['dim1', 'dim2'])
>>> print(arr)
[[1 2]
 [3 4]]

GPSat.utils.dict_of_array_to_dict_of_dataframe(array_dict, concat=False, reset_index=False)

Converts a dictionary of arrays to a dictionary of pandas DataFrames.

Parameters:

array_dictdict: A dictionary where the keys are strings and the values are numpy arrays.
concatbool, optional: If True, concatenates DataFrames with the same number of dimensions. Default is False.
reset_indexbool, optional: If True, resets the index of each DataFrame. Default is False.

Returns:

dict: A dictionary where the keys are strings and the values are pandas DataFrames.

Notes

This function uses the array_to_dataframe function to convert each array to a DataFrame. If concat is True, it will concatenate DataFrames with the same number of dimensions. If reset_index is True, it will reset the index of each DataFrame.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> array_dict = {'a': np.array([1, 2, 3]), 'b': np.array([[1, 2], [3, 4]]), 'c': np.array([1.1, 2.2, 3.3])}
>>> dict_of_array_to_dict_of_dataframe(array_dict)
{'a':       a
    _dim_0   
    0       1
    1       2
    2       3,
'b':               b
    _dim_0 _dim_1   
    0      0       1
           1       2
    1      0       3
           1       4,
'c':        c
    _dim_0     
    0       1.1
    1       2.2
    2       3.3}

>>> dict_of_array_to_dict_of_dataframe(array_dict, concat=True)
{1:         a    c
    _dim_0
    0       1  1.1
    1       2  2.2
    2       3  3.3,
2:                 b
    _dim_0 _dim_1
    0      0       1
           1       2
    1      0       3
           1       4}

>>> dict_of_array_to_dict_of_dataframe(array_dict, reset_index=True)
{'a':    _dim_0  a
     0    1
     1    2
     2    3,
 'b':    _dim_0  _dim_1  b
     0       0    1
     0       1    2
     1       0    3
     1       1    4,
 'c':    _dim_0  c
     0    1.1
     1    2.2
     2    3.3}

GPSat.utils.diff_distance(x, p=2, k=1, default_val=nan)

GPSat.utils.expand_dict_by_vals(d, expand_keys)

GPSat.utils.get_col_values(df, col, return_numpy=True)

This function takes in a pandas DataFrame, a column name or index, and a boolean flag indicating whether to return the column values as a numpy array or not. It returns the values of the specified column as either a pandas Series or a numpy array, depending on the value of the return_numpy flag.

If the column is specified by name and it does not exist in the DataFrame, the function will attempt to use the column index instead. If the column is specified by index and it is not a valid integer index, the function will raise an AssertionError.

Parameters:

df: pandas DataFrame: A pandas DataFrame containing data.
col: str or int: The name of column to extract data from. If specified as an int n, it will extract data from the n-th column.
return_numpy: bool, default True: Whether to return as numpy array.

Returns:

numpy array: If return_numpy is set to True.
pandas Series: If return_numpy is set to False.

Examples

>>> import pandas as pd
>>> from GPSat.utils import get_col_values
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> col_values = get_col_values(df, 'A')
>>> print(col_values)
[1 2 3]

GPSat.utils.get_config_from_sysargv(argv_num=1)

This function takes an optional argument argv_num (default value of 1) and attempts to read a JSON configuration file from the corresponding index in sys.argv.

If the file extension is not .json, it prints a message indicating that the file is not a JSON file.

If an error occurs while reading the file, it prints an error message.

This function could benefit from refactoring to use the argparse package instead of manually parsing sys.argv.

Parameters:

argv_num :int, default 1: The index in sys.argv to read the configuration file from.

Returns:

dict or None: The configuration data loaded from the JSON file, or None if an error occurred while reading the file.

GPSat.utils.get_git_information()

This function retrieves information about the current state of a Git repository.

Returns:

dict

Contains the following keys:

"branch": the name of the current branch.
"remote": a list of strings representing the remote repositories and their URLs.
"commit": the hash of the current commit.
"details": a list of strings representing the details of the last commit (author, date, message).
"modified" (optional): a list of strings representing the files modified since the last commit.

Note

If the current branch cannot be determined, the function will attempt to retrieve it from the list of all branches.
If there are no remote repositories, the "remote" key will be an empty list.
If there are no modified files, the "modified" key will not be present in the output.
This function requires the Git command line tool to be installed and accessible from the command line.

GPSat.utils.get_previous_oi_config(store_path, oi_config, table_name='oi_config', skip_valid_checks_on=None)

This function retrieves the previous configuration from optimal interpolation (OI) results file (store_path)

If the store_path exists, it is expected to contain a table called “oi_config” with the previous configurations stored as rows.

If store_path does not exist, the function creates the file and adds the current configuration (oi_config) as the first row in “oi_config” table.

Each row in the “oi_config” table contains columns ‘idx’ (index), ‘datetime’ and ‘config’. The values in the ‘config’ are provided oi_config (dict) converted to str.

If the table (oi_config) already exists, the function will match the provide oi_config against the previous config values, if any match exactly the largest config id will be returned. Otherwise (oi_config does not exactly match any previous config) then the largest idx value will be increment and returned.

Parameters:

store_path: str: The file path where the configurations are stored.
oi_config: dict: Representing the current configuration for the OI system.
table_name: str, default “oi_config”: The table where the configurations will be store.
skip_valid_checks_on: list of str or None, default None: If list the names of the configuration keys that should be skipped during validation checks. Note: validation checks are not done in this function.

Returns:

dict: Previous configuration as a dictionary.
list: List of configuration keys to skipped during validation checks.
int: Configuration ID.

GPSat.utils.get_weighted_values(df, ref_col, dist_to_col, val_cols, weight_function='gaussian', drop_weight_cols=True, **weight_kwargs)

Calculate the weighted values of specified columns in a DataFrame based on the distance between two other columns, using a specified weighting function. The current implementation supports a Gaussian weight based on the euclidean distance between the values in ref_col and dist_to_col.

Parameters:

dfpandas.DataFrame: The input DataFrame containing the reference column, distance-to column, and value columns.
ref_collist of str or str: The name of the column(s) to use as reference points for calculating distances.
dist_to_collist of str or str: The name of the column(s) to calculate distances to, from ref_col. They should align / correspond to the column(s) set by ref_col.
val_colslist of str or str: The names of the column(s) for which the weighted values are calculated. Can be a single column name or a list of names.
weight_functionstr, optional: The type of weighting function to use. Currently, only “gaussian” is implemented, which applies a Gaussian weighting (exp(-d^2)) based on the squared euclidean distance. The default is “gaussian”.
drop_weight_cols: bool, optional, default True.: if False the total weight and total weighted function values are included in the output
**weight_kwargsdict: Additional keyword arguments for the weighting function. For the Gaussian weight, this includes: - lengthscale (float): The length scale to use in the Gaussian function. This parameter scales the distance before applying the Gaussian function and must be provided.

Returns:

pandas.DataFrame: A DataFrame containing the weighted values for each of the specified value columns. The output DataFrame has the reference column as the index and each of the specified value columns with their weighted values.

Raises:

AssertionError: If the shapes of the ref_col and dist_to_col do not match, or if the required lengthscale parameter for the Gaussian weighting function is not provided.
NotImplementedError: If a weight_function other than “gaussian” is specified.

Notes

The function currently only implements Gaussian weighting. The Gaussian weight is calculated as exp(-d^2 / (2 * l^2)), where d is the squared euclidean distance between ref_col and dist_to_col, and l is the lengthscale.
This implementation assumes the input DataFrame does not contain NaN values in the reference or distance-to columns. Handling NaN values may require additional preprocessing or the use of fillna methods.

Examples

>>> import pandas as pd
>>>
>>> data = {
...     'ref_col': [0, 1, 0, 1],
...     'dist_to_col': [1, 2, 3, 4],
...     'value1': [10, 20, 30, 40],
...     'value2': [100, 200, 300, 400]
... }
>>> df = pd.DataFrame(data)
>>> weighted_df = get_weighted_values(df, 'ref_col', 'dist_to_col', ['value1', 'value2'], lengthscale=1.0)
>>> print(weighted_df)

GPSat.utils.glue_local_predictions(preds_df: DataFrame, expert_locs_df: DataFrame, sigma: int | float | list = 3) → DataFrame

Depracated. Use glue_local_predictions_1d and glue_local_predictions_2d instead.

Glues overlapping predictions by taking a normalised Gaussian weighted average.

Warning: This method only deals with expert locations on a regular grid.

Parameters:

preds_df: pd.DataFrame

containing predictions generated from local expert OI. It should have the following columns:

pred_loc_x (float): The x-coordinate of the prediction location.
pred_loc_y (float): The y-coordinate of the prediction location.
f* (float): The predictive mean at the location (pred_loc_x, pred_loc_y).
f*_var (float): The predictive variance at the location (pred_loc_x, pred_loc_y).

expert_locs_df: pd.DataFrame

containing local expert locations used to perform optimal interpolation. It should have the following columns:

x (float): The x-coordinate of the expert location.
y (float): The y-coordinate of the expert location.

sigma: int, float, or list, default 3

The standard deviation of the Gaussian weighting in the x and y directions.

If a single value is provided, it is used for both directions.
If a list is provided, the first value is used for the x direction and the second value is used for the y direction. Defaults to 3.

Returns:

pd.DataFrame:

Dataframe consisting of glued predictions (mean and std). It has the following columns:

pred_loc_x (float): The x-coordinate of the prediction location.
pred_loc_y (float): The y-coordinate of the prediction location.
f* (float): The glued predictive mean at the location (pred_loc_x, pred_loc_y).
f*_std (float): The glued predictive standard deviation at the location (pred_loc_x, pred_loc_y).

Notes

The function assumes that the expert locations are equally spaced in both the x and y directions.
The function uses the scipy.stats.norm.pdf function to compute the Gaussian weights.
The function normalizes the weighted sums with the total weights at each location.

GPSat.utils.grid_2d_flatten(x_range, y_range, grid_res=None, step_size=None, num_step=None, center=True)

Create a 2D grid of points defined by x and y ranges, with the option to specify the grid resolution, step size, or number of steps. The resulting grid is flattened and concatenated into a 2D array of (x,y) coordinates.

Parameters:

x_range: tuple or list of floats

Two values representing the minimum and maximum values of the x-axis range.

y_range: tuple or list of floats

Two values representing the minimum and maximum values of the y-axis range.

grid_res: float or None, default None

The grid resolution, i.e. the distance between adjacent grid points. If specified, this parameter takes precedence over step_size and num_step.

step_size: float or None, default None

The step size between adjacent grid points. If specified, this parameter takes precedence over num_step.

num_step: int or None, default None

The number of steps between the minimum and maximum values of the x and y ranges. If specified, this parameter is used only if grid_res and step_size are not specified (are None). Note: the number of steps includes the starting point, so from 0 to 1 is two steps

center: bool, default True

If True, the resulting grid points will be the centers of the grid cells.
If False, the resulting grid points will be the edges of the grid cells.

Returns:

ndarray: A 2D array of (x,y) coordinates, where each row represents a single point in the grid.

Raises:

AssertionError: If grid_res, step_size and num_step are all unspecified. Must specify at least one.

Examples

>>> from GPSat.utils import grid_2d_flatten
>>> grid_2d_flatten(x_range=(0, 2), y_range=(0, 2), grid_res=1)
array([[0.5, 0.5],
       [1.5, 0.5],
       [0.5, 1.5],
       [1.5, 1.5]])

GPSat.utils.guess_track_num(x, thresh, start_track=0)

GPSat.utils.inverse_sigmoid(y, low=0, high=1)

GPSat.utils.inverse_softplus(y, shift=0)

GPSat.utils.json_load(file_path)

This function loads a JSON file from the specified file path and applies a nested dictionary literal evaluation (nested_dict_literal_eval) to convert any string keys in the format of ‘(…,…)’ to tuple keys.

The resulting dictionary is returned.

Parameters:

file_path: str: The path to the JSON file to be loaded.

Returns:

dict or list of dict: The loaded JSON file as a dictionary or list of dictionaries.

Examples

Assuming a JSON file named ‘config.json’ with the following contents: {

“key1”: “value1”,
“(‘key2’, ‘key3’)”: “value2”, “key4”: {“(‘key5’, ‘key6’)”: “value3”}

}

The following code will load the file and convert the ‘(key2, key3)’ and ‘(key5, key6)’ keys to tuple keys: config = json_load(‘config.json’) print(config)

{‘key1’: ‘value1’,: ‘(key2, key3)’: ‘value2’, ‘key4’: {‘(key5, key6)’: ‘value3’}}

GPSat.utils.json_serializable(d, max_len_df=100)

Converts a dictionary to a format that can be stored as JSON via the json.dumps() method.

Parameters:

d :dict: The dictionary to be converted.
max_len_df: int, default 100: The maximum length of a Pandas DataFrame or Series that can be converted to a string representation. If the length of the DataFrame or Series is greater than this value, it will be stored as a string. Defaults to 100.

Returns:

dict: The converted dictionary.

Raises:

AssertionError: If the input is not a dictionary.

Notes

If a key in the dictionary is a tuple, it will be converted to a string.

To recover the original tuple, use nested_dict_literal_eval. - If a value in the dictionary is a dictionary, the function will be called recursively to convert it. - If a value in the dictionary is a NumPy array, it will be converted to a list. - If a value in the dictionary is a Pandas DataFrame or Series, it will be converted to a dictionary and the function will be called recursively to convert it if its length is less than or equal to max_len_df. Otherwise, it will be stored as a string. - If a value in the dictionary is not JSON serializable, it will be cast as a string.

GPSat.utils.log_lines(*args, level='debug')

This function logs lines to a file with a specified logging level.

This function takes in any number of arguments and a logging level.

The function checks that the logging level is valid and then iterates through the arguments.

If an argument is a string, integer, float, dictionary, tuple, or list, it is printed and logged with the specified logging level.

If an argument is not one of these types, it is not logged and a message is printed indicating the argument’s type.

Parameters:

*args: tuple: arguments to be provided to logging using the method specified by level
level: str, default “debug”: must be one of [“debug”, “info”, “warning”, “error”, “critical”] each argument provided is logged with getattr(logging, level)(arg)

Returns:

None

GPSat.utils.match(x, y, exact=True, tol=1e-09)

This function takes two arrays, x and y, and returns an array of indices indicating where the elements of x match the elements of y. Can match exactly or within a specified tolerance.

Parameters:

x: array-like: the first array to be matched. If not an array will convert via to_array.
y: array-like: the second array to be matched against. If not an array will convert via to_array.
exact: bool, default=True.: If True, the function matches exactly. If False, the function matches within a specified tolerance.
tol: float, optional, default=1e-9.: The tolerance used for matching when exact=False.

Returns:

indices: array: the indices of the matching elements in y for each element in x.

Raises:

AssertionError: if any element in x is not found in y or if multiple matches are found for any element in x.

Note

This function requires x and y to be arrays or can be converted by to_array If exact=False, the function only makes sense with floats. Use exact=True for int and str. If both x and y are large, with lengths n and m, this function can take up alot of memory as an intermediate bool array of size nxm is created. If there are multiple matches of x in y the index of the first match is return

GPSat.utils.move_to_archive(top_dir, file_names=None, suffix='', archive_sub_dir='Archive', verbose=False)

Moves specified files from a directory to an archive sub-directory within the same directory. Moved files will have a suffix added on before file extension.

Parameters:

top_dirstr: The path to the directory containing the files to be moved.
file_nameslist of str, default None: The names of the files to be moved. If not specified, all files in the directory will be moved.
suffixstr, default “”.: A string to be added to the end of the file name before the extension in the archive directory.
archive_sub_dirstr, default ‘Archive’: The name of the sub-directory within the top directory where the files will be moved.
verbosebool, default is False.: If True, prints information about the files being moved.

Returns:

None: The function only moves files and does not return anything.

Note

If the archive sub-directory does not exist, it will be created.

If a file with the same name as the destination file already exists in the archive sub-directory, it will be overwritten.

Raises:

AssertionError: If top_dir does not exist or file_names is not specified.

Examples

Move all files in directory to archive sub-directory: >>> move_to_archive(“path/to/directory”)

Move specific files to archive sub-directory with a suffix added to the file name: >>> move_to_archive(“path/to/directory”, file_names=[“file1.txt”, “file2.txt”], suffix=”_backup”)

Move specific files to a custom archive sub-directory: >>> move_to_archive(“path/to/directory”, file_names=[“file1.txt”, “file2.txt”], archive_sub_dir=”Old Files”)

GPSat.utils.nested_dict_literal_eval(d, verbose=False)

Converts a nested dictionary with string keys that represent tuples to a dictionary with tuple keys.

Parameters:

d: dict: The nested dictionary to be converted.
verbose: bool, default False: If True, prints information about the keys being converted.

Returns:

dict: The converted dictionary with tuple keys.

Raises:

ValueError: If a string key cannot be evaluated as a tuple.

Note

This function modifies the original dictionary in place.

GPSat.utils.nll(y, mu, sig, return_tot=True)

GPSat.utils.not_nan(x)

GPSat.utils.pandas_to_dict(x)

Converts a pandas Series or DataFrame (row) to a dictionary.

Parameters:

x: pd.Series, pd.DataFrame or dict: The input object to be converted to a dictionary.

Returns:

dict:: A dictionary representation of the input object.

Raises:

AssertionError: If the input object is a DataFrame with more than one row.

Warning

If the input object is not a pandas Series, DataFrame, or dictionary, a warning is issued and the input object is returned as is.

Examples

>>> import pandas as pd
>>> data = {'name': ['John', 'Jane'], 'age': [30, 25]}
>>> df = pd.DataFrame(data)
>>> pandas_to_dict(df)
AssertionError: in pandas_to_dict input provided as DataFrame, expected to only have 1 row, shape is: (2, 2)

>>> series = pd.Series(data['name'])
>>> pandas_to_dict(series)
{0: 'John', 1: 'Jane'}

>>> dictionary = {'name': ['John', 'Jane'], 'age': [30, 25]}
>>> pandas_to_dict(dictionary)
{'name': ['John', 'Jane'], 'age': [30, 25]}

select a single row of the dataframe

>>> pandas_to_dict(df.iloc[[0]])
{'name': 'John', 'age': 30}

GPSat.utils.pip_freeze_to_dataframe()

GPSat.utils.pretty_print_class(x)

This function takes in a class object as input and returns a string representation of the class name without the leading “<class ‘” and trailing “’>”.

Alternatively will remove leading ‘<__main__.’ and remove ‘ object at ‘, including anything that follows

The function achieves this by invoking the __str__ method of the class object and then using regular expressions to remove the unwanted characters.

Parameters:

x: an arbitrary class instance

Returns:

str

Examples

class MyClass:: pass

print(pretty_print_class(MyClass))

GPSat.utils.rmse(y, mu)

GPSat.utils.sigmoid(x, low=0, high=1)

GPSat.utils.softplus(x, shift=0)

GPSat.utils.sparse_true_array(shape, grid_space=1, grid_space_offset=0)

Create a boolean numpy array with True values regularly spaced throughout, and False elsewhere.

Parameters:

shape: iterable (e.g. list or tuple): representing the shape of the output array.
grid_space: int, default 1: representing the spacing between True values.
grid_space_offset: int, default 0: representing the offset of the first True value in each dimension.

Returns:

np.array: A boolean array with dimension equal to shape, with False everywhere except for Trues regularly spaced every ‘grid_space’. The fraction of True will be roughly equal to (1/n)^d where n = grid_space, d = len(shape).

Note

The first dimension is treated as the y dimension. This function will return a bool array with dimension equal to shape with False everywhere except for Trues regularly spaced every ‘grid_space’. The fraction of True will be roughly equal to (1/n)^d where n = grid_space, d = len(shape). The function allows for grid_space_offset to be specific to each dimension.

GPSat.utils.stats_on_vals(vals, measure=None, name=None, qs=None)

This function calculates various statistics on a given array of values.

Parameters:

vals: array-like: The input array of values.
measure: str or None, default is None: The name of the measure being calculated.
name: str or None, default is None: The name of the column in the output dataframe. Default is None.
qs: list or None, defualt None: A list of quantiles to calculate. If None then will use [0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99].

Returns:

pd.DataFrame: containing the following statistics: - measure: The name of the measure being calculated. - size: The number of elements in the input array. - num_not_nan: The number of non-NaN elements in the input array. - num_inf: The number of infinite elements in the input array. - min: The minimum value in the input array. - mean: The mean value of the input array. - max: The maximum value in the input array. - std: The standard deviation of the input array. - skew: The skewness of the input array. - kurtosis: The kurtosis of the input array. - qX: The Xth quantile of the input array, where X is the value in the qs parameter.

Note

The function also includes a timer decorator that calculates the time taken to execute the function.

GPSat.utils.to_array(*args, date_format='%Y-%m-%d')

Converts input arguments to numpy arrays.

Parameters:

*argstuple: Input arguments to be converted to numpy arrays.
date_formatstr, optional: Date format to be used when converting datetime.date objects to numpy arrays.

Returns:

generator: A generator that yields numpy arrays.

Note

This function converts input arguments to numpy arrays. If the input argument is already a numpy array, it is yielded as is. If the input argument is a list or tuple, it is converted to a numpy array and yielded. If the input argument is an integer, float, string, boolean, or numpy boolean, it is converted to a numpy array and yielded. If the input argument is a numpy integer or float, it is converted to a numpy array and yielded. If the input argument is a datetime.date object, it is converted to a numpy array using the specified date format and yielded. If the input argument is a numpy datetime64 object, it is yielded as is. If the input argument is None, an empty numpy array is yielded. If the input argument is of any other data type, a warning is issued and the input argument is converted to a numpy array of type object and yielded.

Examples

>>> import datetime
>>> import numpy as np
>>> x = [1, 2, 3]

since function returns are generator, get values out with next

>>> print(next(to_array(x)))
[1 2 3]

or, for a single array like object, can assign with

>>> c, =  to_array(x)

>>> y = np.array([4, 5, 6])
>>> z = datetime.date(2021, 1, 1)
>>> for arr in to_array(x, y, z):
...     print(f"arr type: {type(arr)}, values: {arr}")
arr type: <class 'numpy.ndarray'>, values: [1 2 3]
arr type: <class 'numpy.ndarray'>, values: [4 5 6]
arr type: <class 'numpy.ndarray'>, values: ['2021-01-01']

GPSat.utils.track_num_for_date(x)