awswrangler.catalog.get_csv_partitions¶
- awswrangler.catalog.get_csv_partitions(database: str, table: str, expression: str | None = None, catalog_id: str | None = None, boto3_session: Session | None = None) dict[str, list[str]] ¶
Get all partitions from a Table in the AWS Glue Catalog.
Expression argument instructions: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.get_partitions
Note
This function has arguments which can be configured globally through wr.config or environment variables:
catalog_id
database
Check out the Global Configurations Tutorial for details.
- Parameters:
database (
str
) – Database name.table (
str
) – Table name.expression (
str
|None
) – An expression that filters the partitions to be returned.catalog_id (
str
|None
) – The ID of the Data Catalog from which to retrieve Databases. IfNone
is provided, the AWS account ID is used by default.boto3_session (
Session
|None
) – The default boto3 session will be used if boto3_session receiveNone
.
- Return type:
dict
[str
,list
[str
]]- Returns:
partitions_values: Dictionary with keys as S3 path locations and values as a list of partitions values as str (e.g. {‘s3://bucket/prefix/y=2020/m=10/’: [‘2020’, ‘10’]}).
Examples
Fetch all partitions
>>> import awswrangler as wr >>> wr.catalog.get_csv_partitions( ... database='default', ... table='my_table', ... ) { 's3://bucket/prefix/y=2020/m=10/': ['2020', '10'], 's3://bucket/prefix/y=2020/m=11/': ['2020', '11'], 's3://bucket/prefix/y=2020/m=12/': ['2020', '12'] }
Filtering partitions
>>> import awswrangler as wr >>> wr.catalog.get_csv_partitions( ... database='default', ... table='my_table', ... expression='m=10' ... ) { 's3://bucket/prefix/y=2020/m=10/': ['2020', '10'] }