awswrangler.athena.unload

awswrangler.athena.unload(sql: str, path: str, database: str, file_format: str = 'PARQUET', compression: str | None = None, field_delimiter: str | None = None, partitioned_by: List[str] | None = None, workgroup: str | None = None, encryption: str | None = None, kms_key: str | None = None, boto3_session: Session | None = None, data_source: str | None = None, params: Dict[str, Any] | None = None, athena_query_wait_polling_delay: float = 0.25) Any

Write query results from a SELECT statement to the specified data format using UNLOAD.

https://docs.aws.amazon.com/athena/latest/ug/unload.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • athena_query_wait_polling_delay

  • workgroup

Check out the Global Configurations Tutorial for details.

Parameters:
  • sql (str) – SQL query.

  • path (str, optional) – Amazon S3 path.

  • database (str) – AWS Glue/Athena database name - It is only the origin database from where the query will be launched. You can still using and mixing several databases writing the full table name within the sql (e.g. database.table).

  • file_format (str) – File format of the output. Possible values are ORC, PARQUET, AVRO, JSON, or TEXTFILE

  • compression (Optional[str]) – This option is specific to the ORC and Parquet formats. For ORC, possible values are lz4, snappy, zlib, or zstd. For Parquet, possible values are gzip or snappy. For ORC, the default is zlib, and for Parquet, the default is gzip.

  • field_delimiter (str) – A single-character field delimiter for files in CSV, TSV, and other text formats.

  • partitioned_by (Optional[List[str]]) – An array list of columns by which the output is partitioned.

  • workgroup (str, optional) – Athena workgroup.

  • encryption (str, optional) – Valid values: [None, ‘SSE_S3’, ‘SSE_KMS’]. Notice: ‘CSE_KMS’ is not supported.

  • kms_key (str, optional) – For SSE-KMS, this is the KMS key ARN or ID.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.

  • params (Dict[str, any], optional) – Dict of parameters that will be used for constructing the SQL query. Only named parameters are supported. The dict needs to contain the information in the form {‘name’: ‘value’} and the SQL query needs to contain :name;. Note that for varchar columns and similar, you must surround the value in single quotes.

  • athena_query_wait_polling_delay (float, default: 0.25 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.

Returns:

Query metadata including query execution id, dtypes, manifest & output location.

Return type:

_QueryMetadata

Examples

>>> import awswrangler as wr
>>> res = wr.athena.unload(
...     sql="SELECT * FROM my_table WHERE name=:name; AND city=:city;",
...     params={"name": "'filtered_name'", "city": "'filtered_city'"}
... )