awswrangler.catalog.get_csv_partitions¶

awswrangler.catalog.get_csv_partitions(database: str, table: str, expression: str | None = None, catalog_id: str | None = None, boto3_session: Session | None = None) → dict[str, list[str]]¶

Get all partitions from a Table in the AWS Glue Catalog.

Expression argument instructions: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_partitions

Parameters:

database (str) – Database name.
table (str) – Table name.
expression (str | None) – An expression that filters the partitions to be returned.
catalog_id (str | None) – The ID of the Data Catalog from which to retrieve Databases. If None is provided, the AWS account ID is used by default.
boto3_session (Session | None) – The default boto3 session will be used if boto3_session receive None.

Return type:

dict[str, list[str]]

Returns:

partitions_values: Dictionary with keys as S3 path locations and values as a list of partitions values as str (e.g. {‘s3://bucket/prefix/y=2020/m=10/’: [‘2020’, ‘10’]}).

Examples

Fetch all partitions

>>> import awswrangler as wr
>>> wr.catalog.get_csv_partitions(
...     database='default',
...     table='my_table',
... )
{
    's3://bucket/prefix/y=2020/m=10/': ['2020', '10'],
    's3://bucket/prefix/y=2020/m=11/': ['2020', '11'],
    's3://bucket/prefix/y=2020/m=12/': ['2020', '12']
}

Filtering partitions

>>> import awswrangler as wr
>>> wr.catalog.get_csv_partitions(
...     database='default',
...     table='my_table',
...     expression='m=10'
... )
{
    's3://bucket/prefix/y=2020/m=10/': ['2020', '10']
}