awswrangler.catalog.create_parquet_table¶

awswrangler.catalog.create_parquet_table(database: str, table: str, path: str, columns_types: dict[str, str], table_type: str | None = None, partitions_types: dict[str, str] | None = None, bucketing_info: Tuple[List[str], int] | None = None, catalog_id: str | None = None, compression: str | None = None, description: str | None = None, parameters: dict[str, str] | None = None, columns_comments: dict[str, str] | None = None, columns_parameters: dict[str, dict[str, str]] | None = None, mode: Literal['overwrite', 'append'] = 'overwrite', catalog_versioning: bool = False, athena_partition_projection_settings: AthenaPartitionProjectionSettings | None = None, boto3_session: Session | None = None) → None¶

Create a Parquet Table (Metadata Only) in the AWS Glue Catalog.

‘https://docs.aws.amazon.com/athena/latest/ug/data-types.html’

Parameters:

database (str) – Database name.
table (str) – Table name.
path (str) – Amazon S3 path (e.g. s3://bucket/prefix/).
columns_types (dict[str, str]) – Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}).
table_type (str | None) – The type of the Glue Table. Set to EXTERNAL_TABLE if None.
partitions_types (dict[str, str] | None) – Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).
bucketing_info (Tuple[List[str], int] | None) – Tuple consisting of the column names used for bucketing as the first element and the number of buckets as the second element. Only str, int and bool are supported as column data types for bucketing.
catalog_id (str | None) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.
compression (str | None) – Compression style (None, snappy, gzip, etc).
description (str | None) – Table description
parameters (dict[str, str] | None) – Key/value pairs to tag the table.
columns_comments (dict[str, str] | None) – Columns names and the related comments (e.g. {‘col0’: ‘Column 0.’, ‘col1’: ‘Column 1.’, ‘col2’: ‘Partition.’}).
columns_parameters (dict[str, dict[str, str]] | None) – Columns names and the related parameters (e.g. {‘col0’: {‘par0’: ‘Param 0’, ‘par1’: ‘Param 1’}}).
mode (Literal['overwrite', 'append']) – ‘overwrite’ to recreate any possible existing table or ‘append’ to keep any possible existing table.
catalog_versioning (bool) – If True and mode=”overwrite”, creates an archived version of the table catalog before updating it.

athena_partition_projection_settings (AthenaPartitionProjectionSettings | None) –

Note

This function has arguments which can be configured globally through wr.config or environment variables:

catalog_id

database

Check out the Global Configurations Tutorial for details.

Parameters of the Athena Partition Projection (https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html). AthenaPartitionProjectionSettings is a TypedDict, meaning the passed parameter can be instantiated either as an instance of AthenaPartitionProjectionSettings or as a regular Python dict.

Following projection parameters are supported:

Projection Parameters¶
Name	Type	Description
projection_types	Optional[Dict[str, str]]	Dictionary of partitions names and Athena projections types. Valid types: “enum”, “integer”, “date”, “injected” https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘enum’, ‘col2_name’: ‘integer’})
projection_ranges	Optional[Dict[str, str]]	Dictionary of partitions names and Athena projections ranges. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘0,10’, ‘col2_name’: ‘-1,8675309’})
projection_values	Optional[Dict[str, str]]	Dictionary of partitions names and Athena projections values. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘A,B,Unknown’, ‘col2_name’: ‘foo,boo,bar’})
projection_intervals	Optional[Dict[str, str]]	Dictionary of partitions names and Athena projections intervals. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘1’, ‘col2_name’: ‘5’})
projection_digits	Optional[Dict[str, str]]	Dictionary of partitions names and Athena projections digits. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_name’: ‘1’, ‘col2_name’: ‘2’})
projection_formats	Optional[Dict[str, str]]	Dictionary of partitions names and Athena projections formats. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html (e.g. {‘col_date’: ‘yyyy-MM-dd’, ‘col2_timestamp’: ‘yyyy-MM-dd HH:mm:ss’})
projection_storage_location_template	Optional[str]	Value which is allows Athena to properly map partition values if the S3 file locations do not follow

Return type:

None a typical …/column=value/… pattern. https://docs.aws.amazon.com/athena/latest/ug/partition-projection-setting-up.html (e.g. s3://bucket/table_root/a=${a}/${b}/some_static_subdirectory/${c}/)

boto3_session: The default boto3 session will be used if boto3_session receive None.

Examples

>>> import awswrangler as wr
>>> wr.catalog.create_parquet_table(
...     database='default',
...     table='my_table',
...     path='s3://bucket/prefix/',
...     columns_types={'col0': 'bigint', 'col1': 'double'},
...     partitions_types={'col2': 'date'},
...     compression='snappy',
...     description='My own table!',
...     parameters={'source': 'postgresql'},
...     columns_comments={'col0': 'Column 0.', 'col1': 'Column 1.', 'col2': 'Partition.'}
... )