awswrangler.athena.create_ctas_table

awswrangler.athena.create_ctas_table(sql: str, database: str | None = None, ctas_table: str | None = None, ctas_database: str | None = None, s3_output: str | None = None, storage_format: str | None = None, write_compression: str | None = None, partitioning_info: List[str] | None = None, bucketing_info: Tuple[List[str], int] | None = None, field_delimiter: str | None = None, schema_only: bool = False, workgroup: str | None = None, data_source: str | None = None, encryption: str | None = None, kms_key: str | None = None, categories: List[str] | None = None, wait: bool = False, athena_query_wait_polling_delay: float = 0.25, boto3_session: Session | None = None) Any

Create a new table populated with the results of a SELECT query.

https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • athena_query_wait_polling_delay

  • workgroup

Check out the Global Configurations Tutorial for details.

Parameters:
  • sql (str) – SELECT SQL query.

  • database (Optional[str], optional) – The name of the database where the original table is stored.

  • ctas_table (Optional[str], optional) – The name of the CTAS table. If None, a name with a random string is used.

  • ctas_database (Optional[str], optional) – The name of the alternative database where the CTAS table should be stored. If None, database is used, that is the CTAS table is stored in the same database as the original table.

  • s3_output (Optional[str], optional) – The output Amazon S3 path. If None, either the Athena workgroup or client-side location setting is used. If a workgroup enforces a query results location, then it overrides this argument.

  • storage_format (Optional[str], optional) – The storage format for the CTAS query results, such as ORC, PARQUET, AVRO, JSON, or TEXTFILE. PARQUET by default.

  • write_compression (Optional[str], optional) – The compression type to use for any storage format that allows compression to be specified.

  • partitioning_info (Optional[List[str]], optional) – A list of columns by which the CTAS table will be partitioned.

  • bucketing_info (Optional[Tuple[List[str], int]], optional) – Tuple consisting of the column names used for bucketing as the first element and the number of buckets as the second element. Only str, int and bool are supported as column data types for bucketing.

  • field_delimiter (Optional[str], optional) – The single-character field delimiter for files in CSV, TSV, and text files.

  • schema_only (bool, optional) – _description_, by default False

  • workgroup (Optional[str], optional) – Athena workgroup.

  • data_source (Optional[str], optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ is used.

  • encryption (str, optional) – Valid values: [None, ‘SSE_S3’, ‘SSE_KMS’]. Note: ‘CSE_KMS’ is not supported.

  • kms_key (str, optional) – For SSE-KMS, this is the KMS key ARN or ID.

  • categories (List[str], optional) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.

  • wait (bool, default False) – Whether to wait for the query to finish and return a dictionary with the Query metadata.

  • athena_query_wait_polling_delay (float, default: 0.25 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.

  • boto3_session (Optional[boto3.Session], optional) – Boto3 Session. The default boto3 session is used if boto3_session is None.

Returns:

A dictionary with the the CTAS database and table names. If wait is False, the query ID is included, otherwise a Query metadata object is added instead.

Return type:

Dict[str, Union[str, _QueryMetadata]]

Examples

Select all into a new table and encrypt the results

>>> import awswrangler as wr
>>> wr.athena.create_ctas_table(
...     sql="select * from table",
...     database="default",
...     encryption="SSE_KMS",
...     kms_key="1234abcd-12ab-34cd-56ef-1234567890ab",
... )
{'ctas_database': 'default', 'ctas_table': 'temp_table_5669340090094....', 'ctas_query_id': 'cc7dfa81-831d-...'}

Create a table with schema only

>>> wr.athena.create_ctas_table(
...     sql="select col1, col2 from table",
...     database="default",
...     ctas_table="my_ctas_table",
...     schema_only=True,
...     wait=True,
... )

Partition data and save to alternative CTAS database

>>> wr.athena.create_ctas_table(
...     sql="select * from table",
...     database="default",
...     ctas_database="my_ctas_db",
...     storage_format="avro",
...     write_compression="snappy",
...     partitioning_info=["par0", "par1"],
...     wait=True,
... )