Data Fetching#
📘 Interactive Version: For a hands-on experience with this chapter’s content, access the interactive notebook in Google Colab. Week 3 materials can be found here. In this section, we will delve into the process of acquiring the dataset that has been integral to our analyses – the OLCI data from the Sentinel-3 satellite, part of the Copernicus Dataspace. This segment will guide you through the nuances of accessing this rich dataset, understanding its structure, and efficiently retrieving the data you need for your work.
Copernicus Data Space#
Overview
Copernicus Data Space is a cornerstone of the European Union’s Earth observation program, providing a wealth of data from the Sentinel satellites. Aimed at monitoring the Earth’s environment, it supports applications in areas like climate change, disaster response, and urban planning.
Key Features
Diverse Datasets: Offers imagery, atmospheric measurements, and climate indicators.
Accessibility: Data is freely accessible, fostering open science and research.
Resources
For more information and data access, visit the Copernicus Dataspace.
Set up Accounts#
Before delving into the specifics of data retrieval, it’s crucial to ensure you have access to the necessary platforms.
Copernicus Dataspace: Accessing data from the Copernicus Dataspace requires a separate registration. If you haven’t done so, please take a moment to create an account. Simply visit the Copernicus Dataspace registration page and follow the instructions to sign up.
Data Fetching Logic#
The logic underlying the data fetching process involves several key steps:
Area and Time Specification: Initially, we define the geographical scope and the specific time frame of interest. This precise specification allows us to target our data retrieval effectively.
Retrieving Metadata from Copernicus Dataspace: Once the area and time parameters are set, we proceed to fetch a list of relevant file names from Copernicus Dataspace.
Optional 1: Fetching Raw Data from Copernicus Dataspace given date and time: With the metadata saved, we then access the Copernicus Dataspace to retrieve the raw data. You are able to see its preview at Copernicus Dataspace browser with filename you are interested in (to see if it is cloud free, etc), before initiating the download.
Optional 2: Browsing first and download the raw data: You can also go to the Copernicus Dataspace browser first and select you images. With filenames you are interested, you can initiate the download.
Step 0: Set Up#
Before we dive into the data fetching process, it’s essential to lay the groundwork by setting up the necessary packages and ensuring proper authentication. Follow these preparatory steps to create a smooth and efficient workflow: Install Required Packages: Make sure all the necessary packages are installed in your working environment. This includes libraries specific to data handling, geospatial analysis, and any other tools relevant to your project. On Google Colab you don’t need to do this, but this is a commpn practice when you exceute the code on your local machine.
By completing these initial setup step, you’re ensuring that your environment is ready and equipped with the tools needed for data fetching and analysis.
from datetime import datetime, timedelta
from shapely.geometry import Polygon, Point
import numpy as np
import requests
import pandas as pd
from shapely.geometry import Polygon
from xml.etree import ElementTree as ET
from shapely.geometry import Polygon
import os
Remember to replace ‘project_id’ with your actual project id.
Step 1: Read in Functions Needed#
To streamline our data fetching and processing, we’ll first load the essential functions. These functions are designed to handle various tasks such as data retrieval, format conversion, and preliminary data processing. Ensure that you’ve imported all the required functions before proceeding to the next steps of the workflow. All functions have docstrings so please read them to get some ideas of what they do.
def make_api_request(url, method="GET", data=None, headers=None):
global access_token
if not headers:
headers = {"Authorization": f"Bearer {access_token}"}
response = requests.request(method, url, json=data, headers=headers)
if response.status_code in [401, 403]:
global refresh_token
access_token = refresh_access_token(refresh_token)
headers["Authorization"] = f"Bearer {access_token}"
response = requests.request(method, url, json=data, headers=headers)
return response
def query_sentinel3_olci_arctic_data(start_date, end_date, token):
"""
Queries Sentinel-3 OLCI data within a specified time range from the Copernicus Data Space,
targeting data collected over the Arctic region.
Parameters:
start_date (str): Start date in 'YYYY-MM-DD' format.
end_date (str): End date in 'YYYY-MM-DD' format.
token (str): Access token for authentication.
Returns:
DataFrame: Contains details about the Sentinel-3 OLCI images.
"""
all_data = []
# arctic_polygon = "POLYGON((-180 60, 180 60, 180 90, -180 90, -180 60))"
arctic_polygon = (
"POLYGON ((-81.7 71.7, -81.7 73.8, -75.1 73.8, -75.1 71.7, -81.7 71.7))"
)
filter_string = (
f"Collection/Name eq 'SENTINEL-3' and "
f"Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and att/Value eq 'OL_1_EFR___') and "
f"ContentDate/Start gt {start_date}T00:00:00.000Z and ContentDate/Start lt {end_date}T23:59:59.999Z"
)
next_url = (
f"https://catalogue.dataspace.copernicus.eu/odata/v1/Products?"
f"$filter={filter_string} and "
f"OData.CSC.Intersects(area=geography'SRID=4326;{arctic_polygon}')&"
f"$top=1000"
)
headers = {"Authorization": f"Bearer {token}"}
while next_url:
response = make_api_request(next_url, headers=headers)
if response.status_code == 200:
data = response.json()["value"]
all_data.extend(data)
next_url = response.json().get("@odata.nextLink")
else:
print(f"Error fetching data: {response.status_code} - {response.text}")
break
return pd.DataFrame(all_data)
def get_access_and_refresh_token(username, password):
"""Retrieve both access and refresh tokens."""
url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
data = {
"grant_type": "password",
"username": username,
"password": password,
"client_id": "cdse-public",
}
response = requests.post(url, data=data)
response.raise_for_status()
tokens = response.json()
return tokens["access_token"], tokens["refresh_token"]
def refresh_access_token(refresh_token):
"""Attempt to refresh the access token using the refresh token."""
url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
data = {
"grant_type": "refresh_token",
"refresh_token": refresh_token,
"client_id": "cdse-public",
}
headers = {"Content-Type": "application/x-www-form-urlencoded"}
try:
response = requests.post(url, headers=headers, data=data)
response.raise_for_status() # This will throw an error for non-2xx responses
return response.json()["access_token"]
except requests.exceptions.HTTPError as e:
print(f"Failed to refresh token: {e.response.status_code} - {e.response.text}")
if e.response.status_code == 400:
print("Refresh token invalid, attempting re-authentication...")
# Attempt to re-authenticate
username = username
password = password
# This requires securely managing the credentials, which might not be feasible in all contexts
access_token, new_refresh_token = get_access_and_refresh_token(
username, password
) # This is a placeholder
refresh_token = (
new_refresh_token # Update the global refresh token with the new one
)
return access_token
else:
raise
def download_single_product(
product_id, file_name, access_token, download_dir="downloaded_products"
):
"""
Download a single product from the Copernicus Data Space.
:param product_id: The unique identifier for the product.
:param file_name: The name of the file to be downloaded.
:param access_token: The access token for authorization.
:param download_dir: The directory where the product will be saved.
"""
# Ensure the download directory exists
os.makedirs(download_dir, exist_ok=True)
# Construct the download URL
url = (
f"https://zipper.dataspace.copernicus.eu/odata/v1/Products({product_id})/$value"
)
# Set up the session and headers
headers = {"Authorization": f"Bearer {access_token}"}
session = requests.Session()
session.headers.update(headers)
# Perform the request
response = session.get(url, headers=headers, stream=True)
# Check if the request was successful
if response.status_code == 200:
# Define the path for the output file
output_file_path = os.path.join(download_dir, file_name + ".zip")
# Stream the content to a file
with open(output_file_path, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
file.write(chunk)
print(f"Downloaded: {output_file_path}")
else:
print(
f"Failed to download product {product_id}. Status Code: {response.status_code}"
)
Step 2: Extract Metadata from Copernicus Dataspace#
Once you have set up your environment and are authenticated with Copernicus Dataspace, the next step is to extract the filenames that meet your specific criteria.
username = "your_username"
password = "your_password"
access_token, refresh_token = get_access_and_refresh_token(username, password)
start_date = "2018-06-01"
end_date = "2018-06-02"
sentinel3_olci_data = query_sentinel3_olci_arctic_data(
start_date, end_date, access_token
)
# You can also save the metadata
# sentinel3_olci_data.to_csv(
# "/home/wch/data_colocation/Datasets-Co-location/Metadata/sentinel3_olci_metadata_2018_zara.csv",
# index=False,
# )
Below you can print the metadata you have just retrieved, it contains several aspects of S3 OLCI including: filename, Id, geo footprint and sensing data, etc.
from IPython.display import display
display(sentinel3_olci_data)
@odata.mediaContentType | Id | Name | ContentType | ContentLength | OriginDate | PublicationDate | ModificationDate | Online | EvictionDate | S3Path | Checksum | ContentDate | Footprint | GeoFootprint | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | application/octet-stream | f5d75e25-dcd6-533c-87f6-011d6de97462 | S3A_OL_1_EFR____20180601T032045_20180601T032125_20180602T084716_0040_032_004_1080_LN1_O_NT_002.SEN3 | application/octet-stream | 0 | 2018-10-28T21:19:59.653000Z | 2018-06-02T12:33:14.692000Z | 2018-06-02T12:33:14.692000Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T032045_20180601T032125_2018... | [] | {'Start': '2018-06-01T03:20:44.670886Z', 'End': '2018-06-01T03:21:24.842161Z'} | geography'SRID=4326;POLYGON ((-78.5521 74.8857, -80.9457 74.8563, -83.2831 74.8023, -85.6253 74.... | {'type': 'Polygon', 'coordinates': [[[-78.5521, 74.8857], [-80.9457, 74.8563], [-83.2831, 74.802... |
1 | application/octet-stream | 9d2e570a-8504-5947-8c73-6feda1a5b80a | S3A_OL_1_EFR____20180601T014026_20180601T014326_20180602T052347_0179_032_003_1260_LN1_O_NT_002.SEN3 | application/octet-stream | 0 | 2018-11-01T17:54:38.378000Z | 2018-06-02T08:57:02.432000Z | 2018-06-02T08:57:02.432000Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T014026_20180601T014326_2018... | [] | {'Start': '2018-06-01T01:40:25.621348Z', 'End': '2018-06-01T01:43:25.621348Z'} | geography'SRID=4326;POLYGON ((-53.4973 85.3067, -61.0276 85.2807, -68.3596 85.1745, -75.297 84.9... | {'type': 'Polygon', 'coordinates': [[[-53.4973, 85.3067], [-61.0276, 85.2807], [-68.3596, 85.174... |
2 | application/octet-stream | f0e338e0-6e3e-5cd6-b38c-2fd51dab2da7 | S3A_OL_1_EFR____20180601T151419_20180601T151719_20180602T202519_0179_032_011_1620_LN1_O_NT_002.SEN3 | application/octet-stream | 0 | 2018-11-01T18:21:27.991000Z | 2018-06-02T23:53:37.479000Z | 2018-06-02T23:53:37.479000Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T151419_20180601T151719_2018... | [] | {'Start': '2018-06-01T15:14:19.387665Z', 'End': '2018-06-01T15:17:19.387665Z'} | geography'SRID=4326;POLYGON ((-81.12 73.846, -78.9112 73.8144, -76.7017 73.7603, -74.5009 73.683... | {'type': 'Polygon', 'coordinates': [[[-81.12, 73.846], [-78.9112, 73.8144], [-76.7017, 73.7603],... |
3 | application/octet-stream | 9518f48d-0120-59df-b0e7-a1d87170f076 | S3A_OL_1_EFR____20180601T151719_20180601T152019_20180602T202543_0179_032_011_1800_LN1_O_NT_002.SEN3 | application/octet-stream | 0 | 2018-11-01T18:21:37.387000Z | 2018-06-02T23:55:26.408000Z | 2018-06-02T23:55:26.408000Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3A_OL_1_EFR____20180601T151719_20180601T152019_2018... | [] | {'Start': '2018-06-01T15:17:19.387665Z', 'End': '2018-06-01T15:20:19.387665Z'} | geography'SRID=4326;POLYGON ((-82.4652 63.4066, -81.0948 63.3611, -79.7204 63.3025, -78.347 63.2... | {'type': 'Polygon', 'coordinates': [[[-82.4652, 63.4066], [-81.0948, 63.3611], [-79.7204, 63.302... |
4 | application/octet-stream | 5ca7a0cc-29eb-50dd-bf25-cb2353052615 | S3B_OL_1_EFR____20180601T165425_20180601T165725_20200126T011547_0180_008_012_1620_MR1_R_NT_002.SEN3 | application/octet-stream | 0 | 2020-05-04T14:56:38.609000Z | 2020-05-04T16:04:34.446099Z | 2020-05-04T16:04:34.446099Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/01/S3B_OL_1_EFR____20180601T165425_20180601T165725_2020... | [] | {'Start': '2018-06-01T16:54:24.657000Z', 'End': '2018-06-01T16:57:24.657000Z'} | geography'SRID=4326;POLYGON ((-106.245 73.3703, -104.123 73.3434, -101.961 73.2887, -99.8418 73.... | {'type': 'Polygon', 'coordinates': [[[-106.245, 73.3703], [-104.123, 73.3434], [-101.961, 73.288... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20 | application/octet-stream | 4e3945b4-24b6-5f03-9b2f-b1fb26ccc767 | S3B_OL_1_EFR____20180602T180920_20180602T181220_20200126T012653_0180_008_027_1620_MR1_R_NT_002.SEN3 | application/octet-stream | 0 | 2020-05-04T14:41:10.247000Z | 2020-05-04T16:11:06.378588Z | 2020-05-04T16:11:06.378588Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3B_OL_1_EFR____20180602T180920_20180602T181220_2020... | [] | {'Start': '2018-06-02T18:09:19.936000Z', 'End': '2018-06-02T18:12:19.936000Z'} | geography'SRID=4326;POLYGON ((-124.97 73.369, -122.838 73.3418, -120.7 73.2876, -118.56 73.2109,... | {'type': 'Polygon', 'coordinates': [[[-124.97, 73.369], [-122.838, 73.3418], [-120.7, 73.2876], ... |
21 | application/octet-stream | cdf0d9f9-0161-5ddb-90ca-93f47b7517f5 | S3B_OL_1_EFR____20180602T163120_20180602T163420_20200126T012628_0179_008_026_1800_MR1_R_NT_002.SEN3 | application/octet-stream | 0 | 2020-05-04T15:01:10.174000Z | 2020-05-04T16:12:38.769235Z | 2020-05-04T16:12:38.769235Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3B_OL_1_EFR____20180602T163120_20180602T163420_2020... | [] | {'Start': '2018-06-02T16:31:20.167000Z', 'End': '2018-06-02T16:34:20.167000Z'} | geography'SRID=4326;POLYGON ((-101.102 62.9284, -99.763 62.8877, -98.4059 62.8282, -97.0685 62.7... | {'type': 'Polygon', 'coordinates': [[[-101.102, 62.9284], [-99.763, 62.8877], [-98.4059, 62.8282... |
22 | application/octet-stream | 0300620e-0b0b-5bca-9c74-f761b91ce33d | S3B_OL_1_EFR____20180602T162820_20180602T163120_20200126T012622_0179_008_026_1620_MR1_R_NT_002.SEN3 | application/octet-stream | 0 | 2020-05-04T15:04:38.089000Z | 2020-05-04T16:12:39.413863Z | 2020-05-04T16:12:39.413863Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3B_OL_1_EFR____20180602T162820_20180602T163120_2020... | [] | {'Start': '2018-06-02T16:28:20.167000Z', 'End': '2018-06-02T16:31:20.167000Z'} | geography'SRID=4326;POLYGON ((-99.7267 73.3692, -97.6048 73.3423, -95.4369 73.2875, -93.3143 73.... | {'type': 'Polygon', 'coordinates': [[[-99.7267, 73.3692], [-97.6048, 73.3423], [-95.4369, 73.287... |
23 | application/octet-stream | 757400a9-0a28-523a-aa21-62b3d5da6c43 | S3A_OL_1_EFR____20180602T011332_20180602T011415_20180603T062858_0043_032_017_1080_LN1_O_NT_002.SEN3 | application/octet-stream | 0 | 2018-11-01T18:26:26.980000Z | 2018-06-03T08:19:11.370000Z | 2018-06-03T08:19:11.370000Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3A_OL_1_EFR____20180602T011332_20180602T011415_2018... | [] | {'Start': '2018-06-02T01:13:31.651927Z', 'End': '2018-06-02T01:14:14.712487Z'} | geography'SRID=4326;POLYGON ((-46.7631 74.8796, -49.2562 74.8483, -51.5857 74.7934, -53.8789 74.... | {'type': 'Polygon', 'coordinates': [[[-46.7631, 74.8796], [-49.2562, 74.8483], [-51.5857, 74.793... |
24 | application/octet-stream | a1c9e44e-dcac-5c66-b62e-527b383fff45 | S3A_OL_1_EFR____20180602T181007_20180602T181307_20180603T230737_0179_032_027_1620_LN1_O_NT_002.SEN3 | application/octet-stream | 0 | 2018-10-28T21:46:42.233000Z | 2018-06-04T00:31:59.555000Z | 2018-06-04T00:31:59.555000Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-3/OLCI/OL_1_EFR/2018/06/02/S3A_OL_1_EFR____20180602T181007_20180602T181307_2018... | [] | {'Start': '2018-06-02T18:10:06.920439Z', 'End': '2018-06-02T18:13:06.920439Z'} | geography'SRID=4326;POLYGON ((-125.067 73.8593, -122.83 73.8272, -120.625 73.7729, -118.435 73.6... | {'type': 'Polygon', 'coordinates': [[[-125.067, 73.8593], [-122.83, 73.8272], [-120.625, 73.7729... |
25 rows × 15 columns
Step 4: Download#
Once you have the correct filename in the Copernicus format, the final step is to download the data. This process involves authenticating with your Copernicus dataspace credentials and sending a request to download the specified file. Below is an example code snippet demonstrating how to perform the download. Ensure that your username and password are accurate and up-to-date to avoid any authentication issues.
username = "your_username"
password = "your_password"
download_dir = "" # Replace with your desired download directory
product_id = sentinel3_olci_data['Id'][0] # Replace with your desired file id
file_name = sentinel3_olci_data['Name'][0]# Replace with your desired filename
# Download the single product
download_single_product(product_id, file_name, access_token, download_dir)
Until here, you should have the dataset downloaded in the directory you specified.
Another downloading option: Download directly from one file (with know filename) you are interested in#
def query_product_by_name(product_name, token):
"""
Query a specific Sentinel-3 product by its name.
Parameters:
product_name (str): The exact name of the product to search for.
token (str): Access token for authentication.
Returns:
dict: Metadata for the matching product.
"""
url = (
f"https://catalogue.dataspace.copernicus.eu/odata/v1/Products?"
f"$filter=Name eq '{product_name}'"
)
headers = {"Authorization": f"Bearer {token}"}
response = make_api_request(url, headers=headers)
if response.status_code == 200:
data = response.json().get("value", [])
if data:
return data[0] # Return the first matching product (if any)
else:
print(f"No product found with name: {product_name}")
return None
else:
print(f"Error fetching product: {response.status_code} - {response.text}")
return None
username = "your_username"
password = "your_password"
# Step 1: Authenticate and retrieve tokens
access_token, refresh_token = get_access_and_refresh_token(username, password)
# Step 2: Provide the product name
product_name = "S3A_OL_1_EFR____20180602T181007_20180602T181307_20180603T230737_0179_032_027_1620_LN1_O_NT_002.SEN3" # Replace with the specific product name you have
# Step 3: Query the product by name
product_metadata = query_product_by_name(product_name, access_token)
if product_metadata:
product_id = product_metadata["Id"] # Extract product ID from metadata
file_name = product_metadata["Name"] # Extract product name from metadata
# Step 4: Download the product
download_dir = "" # Replace with your desired directory
download_single_product(product_id, file_name, access_token, download_dir)
Downloaded: /Users/weibinchen/Downloads/S3A_OL_1_EFR____20180602T181007_20180602T181307_20180603T230737_0179_032_027_1620_LN1_O_NT_002.SEN3.zip