Data Fetching#

📘 Interactive Version: For a hands-on experience with this chapter’s content, access the interactive notebook in Google Colab. Week 3 materials can be found here. In this section, we will delve into the process of acquiring the dataset that has been integral to our analyses – the OLCI data from the Sentinel-3 satellite, part of the Copernicus Dataspace. This segment will guide you through the nuances of accessing this rich dataset, understanding its structure, and efficiently retrieving the data you need for your work.

Copernicus Data Space#

Overview
Copernicus Data Space is a cornerstone of the European Union’s Earth observation program, providing a wealth of data from the Sentinel satellites. Aimed at monitoring the Earth’s environment, it supports applications in areas like climate change, disaster response, and urban planning.

Key Features

  • Diverse Datasets: Offers imagery, atmospheric measurements, and climate indicators.

  • Accessibility: Data is freely accessible, fostering open science and research.

Resources
For more information and data access, visit the Copernicus Dataspace.


Set up Accounts#

Before delving into the specifics of data retrieval, it’s crucial to ensure you have access to the necessary platforms.

Copernicus Dataspace: Accessing data from the Copernicus Dataspace requires a separate registration. If you haven’t done so, please take a moment to create an account. Simply visit the Copernicus Dataspace registration page and follow the instructions to sign up.

Data Fetching Logic#

The logic underlying the data fetching process involves several key steps:

  1. Area and Time Specification: Initially, we define the geographical scope and the specific time frame of interest. This precise specification allows us to target our data retrieval effectively.

  2. Retrieving Metadata from Copernicus Dataspace: Once the area and time parameters are set, we proceed to fetch a list of relevant file names from Copernicus Dataspace.

  3. Optional 1: Fetching Raw Data from Copernicus Dataspace given date and time: With the metadata saved, we then access the Copernicus Dataspace to retrieve the raw data. You are able to see its preview at Copernicus Dataspace browser with filename you are interested in (to see if it is cloud free, etc), before initiating the download.

  4. Optional 2: Browsing first and download the raw data: You can also go to the Copernicus Dataspace browser first and select you images. With filenames you are interested, you can initiate the download.

Step 0: Set Up#

Before we dive into the data fetching process, it’s essential to lay the groundwork by setting up the necessary packages and ensuring proper authentication. Follow these preparatory steps to create a smooth and efficient workflow: Install Required Packages: Make sure all the necessary packages are installed in your working environment. This includes libraries specific to data handling, geospatial analysis, and any other tools relevant to your project. On Google Colab you don’t need to do this, but this is a commpn practice when you exceute the code on your local machine.

By completing these initial setup step, you’re ensuring that your environment is ready and equipped with the tools needed for data fetching and analysis.

from datetime import datetime, timedelta
from shapely.geometry import Polygon, Point
import numpy as np
import requests
import pandas as pd
from shapely.geometry import Polygon
from xml.etree import ElementTree as ET
from shapely.geometry import Polygon
import os

Remember to replace ‘project_id’ with your actual project id.

Step 1: Read in Functions Needed#

To streamline our data fetching and processing, we’ll first load the essential functions. These functions are designed to handle various tasks such as data retrieval, format conversion, and preliminary data processing. Ensure that you’ve imported all the required functions before proceeding to the next steps of the workflow. All functions have docstrings so please read them to get some ideas of what they do.

def make_api_request(url, method="GET", data=None, headers=None):
    global access_token
    if not headers:
        headers = {"Authorization": f"Bearer {access_token}"}

    response = requests.request(method, url, json=data, headers=headers)
    if response.status_code in [401, 403]:
        global refresh_token
        access_token = refresh_access_token(refresh_token)
        headers["Authorization"] = f"Bearer {access_token}"
        response = requests.request(method, url, json=data, headers=headers)
    return response


def query_sentinel3_olci_arctic_data(start_date, end_date, token):
    """
    Queries Sentinel-3 OLCI data within a specified time range from the Copernicus Data Space,
    targeting data collected over the Arctic region.

    Parameters:
    start_date (str): Start date in 'YYYY-MM-DD' format.
    end_date (str): End date in 'YYYY-MM-DD' format.
    token (str): Access token for authentication.

    Returns:
    DataFrame: Contains details about the Sentinel-3 OLCI images.
    """

    all_data = []
    # arctic_polygon = "POLYGON((-180 60, 180 60, 180 90, -180 90, -180 60))"
    arctic_polygon = (
        "POLYGON ((-81.7 71.7, -81.7 73.8, -75.1 73.8, -75.1 71.7, -81.7 71.7))"
    )

    filter_string = (
        f"Collection/Name eq 'SENTINEL-3' and "
        f"Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and att/Value eq 'OL_1_EFR___') and "
        f"ContentDate/Start gt {start_date}T00:00:00.000Z and ContentDate/Start lt {end_date}T23:59:59.999Z"
    )

    next_url = (
        f"https://catalogue.dataspace.copernicus.eu/odata/v1/Products?"
        f"$filter={filter_string} and "
        f"OData.CSC.Intersects(area=geography'SRID=4326;{arctic_polygon}')&"
        f"$top=1000"
    )

    headers = {"Authorization": f"Bearer {token}"}

    while next_url:
        response = make_api_request(next_url, headers=headers)
        if response.status_code == 200:
            data = response.json()["value"]
            all_data.extend(data)
            next_url = response.json().get("@odata.nextLink")
        else:
            print(f"Error fetching data: {response.status_code} - {response.text}")
            break

    return pd.DataFrame(all_data)


def get_access_and_refresh_token(username, password):
    """Retrieve both access and refresh tokens."""
    url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
    data = {
        "grant_type": "password",
        "username": username,
        "password": password,
        "client_id": "cdse-public",
    }
    response = requests.post(url, data=data)
    response.raise_for_status()
    tokens = response.json()
    return tokens["access_token"], tokens["refresh_token"]


def refresh_access_token(refresh_token):
    """Attempt to refresh the access token using the refresh token."""
    url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
    data = {
        "grant_type": "refresh_token",
        "refresh_token": refresh_token,
        "client_id": "cdse-public",
    }
    headers = {"Content-Type": "application/x-www-form-urlencoded"}
    try:
        response = requests.post(url, headers=headers, data=data)
        response.raise_for_status()  # This will throw an error for non-2xx responses
        return response.json()["access_token"]
    except requests.exceptions.HTTPError as e:
        print(f"Failed to refresh token: {e.response.status_code} - {e.response.text}")
        if e.response.status_code == 400:
            print("Refresh token invalid, attempting re-authentication...")
            # Attempt to re-authenticate
            username = username
            password = password
            # This requires securely managing the credentials, which might not be feasible in all contexts
            access_token, new_refresh_token = get_access_and_refresh_token(
                username, password
            )  # This is a placeholder
            refresh_token = (
                new_refresh_token  # Update the global refresh token with the new one
            )
            return access_token
        else:
            raise

def download_single_product(
    product_id, file_name, access_token, download_dir="downloaded_products"
):
    """
    Download a single product from the Copernicus Data Space.

    :param product_id: The unique identifier for the product.
    :param file_name: The name of the file to be downloaded.
    :param access_token: The access token for authorization.
    :param download_dir: The directory where the product will be saved.
    """
    # Ensure the download directory exists
    os.makedirs(download_dir, exist_ok=True)

    # Construct the download URL
    url = (
        f"https://zipper.dataspace.copernicus.eu/odata/v1/Products({product_id})/$value"
    )

    # Set up the session and headers
    headers = {"Authorization": f"Bearer {access_token}"}
    session = requests.Session()
    session.headers.update(headers)

    # Perform the request
    response = session.get(url, headers=headers, stream=True)

    # Check if the request was successful
    if response.status_code == 200:
        # Define the path for the output file
        output_file_path = os.path.join(download_dir, file_name + ".zip")

        # Stream the content to a file
        with open(output_file_path, "wb") as file:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    file.write(chunk)
        print(f"Downloaded: {output_file_path}")
    else:
        print(
            f"Failed to download product {product_id}. Status Code: {response.status_code}"
        )

Step 2: Extract Metadata from Copernicus Dataspace#

Once you have set up your environment and are authenticated with Copernicus Dataspace, the next step is to extract the filenames that meet your specific criteria.

username = "your_username"
password = "your_password"
access_token, refresh_token = get_access_and_refresh_token(username, password)
start_date = "2018-06-01"
end_date = "2018-06-02"

sentinel3_olci_data = query_sentinel3_olci_arctic_data(
    start_date, end_date, access_token
)

# You can also save the metadata
# sentinel3_olci_data.to_csv(
#     "/home/wch/data_colocation/Datasets-Co-location/Metadata/sentinel3_olci_metadata_2018_zara.csv",
#     index=False,
# )

Below you can print the metadata you have just retrieved, it contains several aspects of S3 OLCI including: filename, Id, geo footprint and sensing data, etc.

from IPython.display import display

display(sentinel3_olci_data)
@odata.mediaContentType Id Name ContentType ContentLength OriginDate PublicationDate ModificationDate Online EvictionDate S3Path Checksum ContentDate Footprint GeoFootprint
0 application/octet-stream f5d75e25-dcd6-533c-87f6-011d6de97462 S3A_OL_1_EFR____20180601T032045_20180601T032125_20180602T084716_0040_032_004_1080_LN1_O_NT_002.SEN3 application/octet-stream 0 2018-10-28T21:19:59.653000Z 2018-06-02T12:33:14.692000Z 2018-06-02T12:33:14.692000Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T032045_20180601T032125_2018... [] {'Start': '2018-06-01T03:20:44.670886Z', 'End': '2018-06-01T03:21:24.842161Z'} geography'SRID=4326;POLYGON ((-78.5521 74.8857, -80.9457 74.8563, -83.2831 74.8023, -85.6253 74.... {'type': 'Polygon', 'coordinates': [[[-78.5521, 74.8857], [-80.9457, 74.8563], [-83.2831, 74.802...
1 application/octet-stream 9d2e570a-8504-5947-8c73-6feda1a5b80a S3A_OL_1_EFR____20180601T014026_20180601T014326_20180602T052347_0179_032_003_1260_LN1_O_NT_002.SEN3 application/octet-stream 0 2018-11-01T17:54:38.378000Z 2018-06-02T08:57:02.432000Z 2018-06-02T08:57:02.432000Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T014026_20180601T014326_2018... [] {'Start': '2018-06-01T01:40:25.621348Z', 'End': '2018-06-01T01:43:25.621348Z'} geography'SRID=4326;POLYGON ((-53.4973 85.3067, -61.0276 85.2807, -68.3596 85.1745, -75.297 84.9... {'type': 'Polygon', 'coordinates': [[[-53.4973, 85.3067], [-61.0276, 85.2807], [-68.3596, 85.174...
2 application/octet-stream f0e338e0-6e3e-5cd6-b38c-2fd51dab2da7 S3A_OL_1_EFR____20180601T151419_20180601T151719_20180602T202519_0179_032_011_1620_LN1_O_NT_002.SEN3 application/octet-stream 0 2018-11-01T18:21:27.991000Z 2018-06-02T23:53:37.479000Z 2018-06-02T23:53:37.479000Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T151419_20180601T151719_2018... [] {'Start': '2018-06-01T15:14:19.387665Z', 'End': '2018-06-01T15:17:19.387665Z'} geography'SRID=4326;POLYGON ((-81.12 73.846, -78.9112 73.8144, -76.7017 73.7603, -74.5009 73.683... {'type': 'Polygon', 'coordinates': [[[-81.12, 73.846], [-78.9112, 73.8144], [-76.7017, 73.7603],...
3 application/octet-stream 9518f48d-0120-59df-b0e7-a1d87170f076 S3A_OL_1_EFR____20180601T151719_20180601T152019_20180602T202543_0179_032_011_1800_LN1_O_NT_002.SEN3 application/octet-stream 0 2018-11-01T18:21:37.387000Z 2018-06-02T23:55:26.408000Z 2018-06-02T23:55:26.408000Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T151719_20180601T152019_2018... [] {'Start': '2018-06-01T15:17:19.387665Z', 'End': '2018-06-01T15:20:19.387665Z'} geography'SRID=4326;POLYGON ((-82.4652 63.4066, -81.0948 63.3611, -79.7204 63.3025, -78.347 63.2... {'type': 'Polygon', 'coordinates': [[[-82.4652, 63.4066], [-81.0948, 63.3611], [-79.7204, 63.302...
4 application/octet-stream 5ca7a0cc-29eb-50dd-bf25-cb2353052615 S3B_OL_1_EFR____20180601T165425_20180601T165725_20200126T011547_0180_008_012_1620_MR1_R_NT_002.SEN3 application/octet-stream 0 2020-05-04T14:56:38.609000Z 2020-05-04T16:04:34.446099Z 2020-05-04T16:04:34.446099Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3B_OL_1_EFR____20180601T165425_20180601T165725_2020... [] {'Start': '2018-06-01T16:54:24.657000Z', 'End': '2018-06-01T16:57:24.657000Z'} geography'SRID=4326;POLYGON ((-106.245 73.3703, -104.123 73.3434, -101.961 73.2887, -99.8418 73.... {'type': 'Polygon', 'coordinates': [[[-106.245, 73.3703], [-104.123, 73.3434], [-101.961, 73.288...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
20 application/octet-stream 4e3945b4-24b6-5f03-9b2f-b1fb26ccc767 S3B_OL_1_EFR____20180602T180920_20180602T181220_20200126T012653_0180_008_027_1620_MR1_R_NT_002.SEN3 application/octet-stream 0 2020-05-04T14:41:10.247000Z 2020-05-04T16:11:06.378588Z 2020-05-04T16:11:06.378588Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3B_OL_1_EFR____20180602T180920_20180602T181220_2020... [] {'Start': '2018-06-02T18:09:19.936000Z', 'End': '2018-06-02T18:12:19.936000Z'} geography'SRID=4326;POLYGON ((-124.97 73.369, -122.838 73.3418, -120.7 73.2876, -118.56 73.2109,... {'type': 'Polygon', 'coordinates': [[[-124.97, 73.369], [-122.838, 73.3418], [-120.7, 73.2876], ...
21 application/octet-stream cdf0d9f9-0161-5ddb-90ca-93f47b7517f5 S3B_OL_1_EFR____20180602T163120_20180602T163420_20200126T012628_0179_008_026_1800_MR1_R_NT_002.SEN3 application/octet-stream 0 2020-05-04T15:01:10.174000Z 2020-05-04T16:12:38.769235Z 2020-05-04T16:12:38.769235Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3B_OL_1_EFR____20180602T163120_20180602T163420_2020... [] {'Start': '2018-06-02T16:31:20.167000Z', 'End': '2018-06-02T16:34:20.167000Z'} geography'SRID=4326;POLYGON ((-101.102 62.9284, -99.763 62.8877, -98.4059 62.8282, -97.0685 62.7... {'type': 'Polygon', 'coordinates': [[[-101.102, 62.9284], [-99.763, 62.8877], [-98.4059, 62.8282...
22 application/octet-stream 0300620e-0b0b-5bca-9c74-f761b91ce33d S3B_OL_1_EFR____20180602T162820_20180602T163120_20200126T012622_0179_008_026_1620_MR1_R_NT_002.SEN3 application/octet-stream 0 2020-05-04T15:04:38.089000Z 2020-05-04T16:12:39.413863Z 2020-05-04T16:12:39.413863Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3B_OL_1_EFR____20180602T162820_20180602T163120_2020... [] {'Start': '2018-06-02T16:28:20.167000Z', 'End': '2018-06-02T16:31:20.167000Z'} geography'SRID=4326;POLYGON ((-99.7267 73.3692, -97.6048 73.3423, -95.4369 73.2875, -93.3143 73.... {'type': 'Polygon', 'coordinates': [[[-99.7267, 73.3692], [-97.6048, 73.3423], [-95.4369, 73.287...
23 application/octet-stream 757400a9-0a28-523a-aa21-62b3d5da6c43 S3A_OL_1_EFR____20180602T011332_20180602T011415_20180603T062858_0043_032_017_1080_LN1_O_NT_002.SEN3 application/octet-stream 0 2018-11-01T18:26:26.980000Z 2018-06-03T08:19:11.370000Z 2018-06-03T08:19:11.370000Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3A_OL_1_EFR____20180602T011332_20180602T011415_2018... [] {'Start': '2018-06-02T01:13:31.651927Z', 'End': '2018-06-02T01:14:14.712487Z'} geography'SRID=4326;POLYGON ((-46.7631 74.8796, -49.2562 74.8483, -51.5857 74.7934, -53.8789 74.... {'type': 'Polygon', 'coordinates': [[[-46.7631, 74.8796], [-49.2562, 74.8483], [-51.5857, 74.793...
24 application/octet-stream a1c9e44e-dcac-5c66-b62e-527b383fff45 S3A_OL_1_EFR____20180602T181007_20180602T181307_20180603T230737_0179_032_027_1620_LN1_O_NT_002.SEN3 application/octet-stream 0 2018-10-28T21:46:42.233000Z 2018-06-04T00:31:59.555000Z 2018-06-04T00:31:59.555000Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3A_OL_1_EFR____20180602T181007_20180602T181307_2018... [] {'Start': '2018-06-02T18:10:06.920439Z', 'End': '2018-06-02T18:13:06.920439Z'} geography'SRID=4326;POLYGON ((-125.067 73.8593, -122.83 73.8272, -120.625 73.7729, -118.435 73.6... {'type': 'Polygon', 'coordinates': [[[-125.067, 73.8593], [-122.83, 73.8272], [-120.625, 73.7729...

25 rows × 15 columns

Step 4: Download#

Once you have the correct filename in the Copernicus format, the final step is to download the data. This process involves authenticating with your Copernicus dataspace credentials and sending a request to download the specified file. Below is an example code snippet demonstrating how to perform the download. Ensure that your username and password are accurate and up-to-date to avoid any authentication issues.

username = "your_username"
password = "your_password"
download_dir = ""  # Replace with your desired download directory
product_id = sentinel3_olci_data['Id'][0] # Replace with your desired file id
file_name = sentinel3_olci_data['Name'][0]# Replace with your desired filename
# Download the single product
download_single_product(product_id, file_name, access_token, download_dir)

Until here, you should have the dataset downloaded in the directory you specified.

Another downloading option: Download directly from one file (with know filename) you are interested in#

def query_product_by_name(product_name, token):
    """
    Query a specific Sentinel-3 product by its name.

    Parameters:
    product_name (str): The exact name of the product to search for.
    token (str): Access token for authentication.

    Returns:
    dict: Metadata for the matching product.
    """
    url = (
        f"https://catalogue.dataspace.copernicus.eu/odata/v1/Products?"
        f"$filter=Name eq '{product_name}'"
    )
    headers = {"Authorization": f"Bearer {token}"}
    
    response = make_api_request(url, headers=headers)
    if response.status_code == 200:
        data = response.json().get("value", [])
        if data:
            return data[0]  # Return the first matching product (if any)
        else:
            print(f"No product found with name: {product_name}")
            return None
    else:
        print(f"Error fetching product: {response.status_code} - {response.text}")
        return None



username = "your_username"
password = "your_password"

# Step 1: Authenticate and retrieve tokens
access_token, refresh_token = get_access_and_refresh_token(username, password)

# Step 2: Provide the product name
product_name = "S3A_OL_1_EFR____20180602T181007_20180602T181307_20180603T230737_0179_032_027_1620_LN1_O_NT_002.SEN3"  # Replace with the specific product name you have

# Step 3: Query the product by name
product_metadata = query_product_by_name(product_name, access_token)

if product_metadata:
    product_id = product_metadata["Id"]  # Extract product ID from metadata
    file_name = product_metadata["Name"]  # Extract product name from metadata

    # Step 4: Download the product
    download_dir = ""  # Replace with your desired directory
    download_single_product(product_id, file_name, access_token, download_dir)
Downloaded: /Users/weibinchen/Downloads/S3A_OL_1_EFR____20180602T181007_20180602T181307_20180603T230737_0179_032_027_1620_LN1_O_NT_002.SEN3.zip