awswrangler.athena.create_ctas_table¶
- awswrangler.athena.create_ctas_table(sql: str, database: str | None = None, ctas_table: str | None = None, ctas_database: str | None = None, s3_output: str | None = None, storage_format: str | None = None, write_compression: str | None = None, partitioning_info: list[str] | None = None, bucketing_info: Tuple[List[str], int] | None = None, field_delimiter: str | None = None, schema_only: bool = False, workgroup: str = 'primary', data_source: str | None = None, encryption: str | None = None, kms_key: str | None = None, categories: list[str] | None = None, wait: bool = False, athena_query_wait_polling_delay: float = 1.0, execution_params: list[str] | None = None, params: dict[str, Any] | list[str] | None = None, paramstyle: Literal['qmark', 'named'] = 'named', boto3_session: Session | None = None) dict[str, str | awswrangler.athena._utils._QueryMetadata] ¶
Create a new table populated with the results of a SELECT query.
https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html
Note
This function has arguments which can be configured globally through wr.config or environment variables:
database
athena_query_wait_polling_delay
workgroup
Check out the Global Configurations Tutorial for details.
- Parameters:
sql (str) – SELECT SQL query.
database (str, optional) – The name of the database where the original table is stored.
ctas_table (str, optional) – The name of the CTAS table. If None, a name with a random string is used.
ctas_database (str, optional) – The name of the alternative database where the CTAS table should be stored. If None, database is used, that is the CTAS table is stored in the same database as the original table.
s3_output (str, optional) – The output Amazon S3 path. If None, either the Athena workgroup or client-side location setting is used. If a workgroup enforces a query results location, then it overrides this argument.
storage_format (str, optional) – The storage format for the CTAS query results, such as ORC, PARQUET, AVRO, JSON, or TEXTFILE. PARQUET by default.
write_compression (str, optional) – The compression type to use for any storage format that allows compression to be specified.
partitioning_info (list[str], optional) – A list of columns by which the CTAS table will be partitioned.
bucketing_info (tuple[list[str], int], optional) – Tuple consisting of the column names used for bucketing as the first element and the number of buckets as the second element. Only str, int and bool are supported as column data types for bucketing.
field_delimiter (str, optional) – The single-character field delimiter for files in CSV, TSV, and text files.
schema_only (bool, optional) – _description_, by default False
workgroup (str) – Athena workgroup. Primary by default.
data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ is used.
encryption (str, optional) – Valid values: [None, ‘SSE_S3’, ‘SSE_KMS’]. Note: ‘CSE_KMS’ is not supported.
kms_key (str, optional) – For SSE-KMS, this is the KMS key ARN or ID.
categories (List[str], optional) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.
wait (bool, default False) – Whether to wait for the query to finish and return a dictionary with the Query metadata.
athena_query_wait_polling_delay (float, default: 1.0 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.
execution_params (List[str], optional [DEPRECATED]) – A list of values for the parameters that are used in the SQL query. This parameter is on a deprecation path. Use
params
and paramstyle` instead.params (Dict[str, Any] | List[str], optional) – Dictionary or list of parameters to pass to execute method. The syntax used to pass parameters depends on the configuration of
paramstyle
.paramstyle (str, optional) – The syntax style to use for the parameters. Supported values are
named
andqmark
. The default isnamed
.boto3_session (boto3.Session, optional) – Boto3 Session. The default boto3 session is used if boto3_session is None.
- Returns:
A dictionary with the the CTAS database and table names. If wait is False, the query ID is included, otherwise a Query metadata object is added instead.
- Return type:
Dict[str, Union[str, _QueryMetadata]]
Examples
Select all into a new table and encrypt the results
>>> import awswrangler as wr >>> wr.athena.create_ctas_table( ... sql="select * from table", ... database="default", ... encryption="SSE_KMS", ... kms_key="1234abcd-12ab-34cd-56ef-1234567890ab", ... ) {'ctas_database': 'default', 'ctas_table': 'temp_table_5669340090094....', 'ctas_query_id': 'cc7dfa81-831d-...'}
Create a table with schema only
>>> wr.athena.create_ctas_table( ... sql="select col1, col2 from table", ... database="default", ... ctas_table="my_ctas_table", ... schema_only=True, ... wait=True, ... )
Partition data and save to alternative CTAS database
>>> wr.athena.create_ctas_table( ... sql="select * from table", ... database="default", ... ctas_database="my_ctas_db", ... storage_format="avro", ... write_compression="snappy", ... partitioning_info=["par0", "par1"], ... wait=True, ... )