awswrangler.catalog.add_csv_partitions

awswrangler.catalog.add_csv_partitions(database: str, table: str, partitions_values: dict[str, list[str]], bucketing_info: Tuple[List[str], int] | None = None, catalog_id: str | None = None, compression: str | None = None, sep: str = ',', serde_library: str | None = None, serde_parameters: dict[str, str] | None = None, boto3_session: Session | None = None, columns_types: dict[str, str] | None = None, partitions_parameters: dict[str, str] | None = None) None

Add partitions (metadata) to a CSV Table in the AWS Glue Catalog.

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • catalog_id

  • database

Check out the Global Configurations Tutorial for details.

Parameters:
  • database (str) – Database name.

  • table (str) – Table name.

  • partitions_values (Dict[str, List[str]]) – Dictionary with keys as S3 path locations and values as a list of partitions values as str (e.g. {‘s3://bucket/prefix/y=2020/m=10/’: [‘2020’, ‘10’]}).

  • bucketing_info (Tuple[List[str], int], optional) – Tuple consisting of the column names used for bucketing as the first element and the number of buckets as the second element. Only str, int and bool are supported as column data types for bucketing.

  • catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.

  • compression (str, optional) – Compression style (None, gzip, etc).

  • sep (str) – String of length 1. Field delimiter for the output file.

  • serde_library (str, optional) – Specifies the SerDe Serialization library which will be used. You need to provide the Class library name as a string. If no library is provided the default is org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.

  • serde_parameters (str, optional) – Dictionary of initialization parameters for the SerDe. The default is {“field.delim”: sep, “escape.delim”: “\”}.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • columns_types (Optional[Dict[str, str]]) – Only required for Hive compability. Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}). P.S. Only materialized columns please, not partition columns.

  • partitions_parameters (Optional[Dict[str, str]]) – Dictionary with key-value pairs defining partition parameters.

Returns:

None.

Return type:

None

Examples

>>> import awswrangler as wr
>>> wr.catalog.add_csv_partitions(
...     database='default',
...     table='my_table',
...     partitions_values={
...         's3://bucket/prefix/y=2020/m=10/': ['2020', '10'],
...         's3://bucket/prefix/y=2020/m=11/': ['2020', '11'],
...         's3://bucket/prefix/y=2020/m=12/': ['2020', '12']
...     }
... )