awswrangler.athena.start_query_execution

awswrangler.athena.start_query_execution(sql: str, database: str | None = None, s3_output: str | None = None, workgroup: str | None = None, encryption: str | None = None, kms_key: str | None = None, params: Dict[str, Any] | None = None, boto3_session: Session | None = None, athena_cache_settings: AthenaCacheSettings | None = None, athena_query_wait_polling_delay: float = 1.0, data_source: str | None = None, wait: bool = False) str | Dict[str, Any]

Start a SQL Query against AWS Athena.

Note

Create the default Athena bucket if it doesn’t exist and s3_output is None. (E.g. s3://aws-athena-query-results-ACCOUNT-REGION/)

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • athena_cache_settings

  • athena_query_wait_polling_delay

  • workgroup

Check out the Global Configurations Tutorial for details.

Parameters:
  • sql (str) – SQL query.

  • database (str, optional) – AWS Glue/Athena database name.

  • s3_output (str, optional) – AWS S3 path.

  • workgroup (str, optional) – Athena workgroup.

  • encryption (str, optional) – None, ‘SSE_S3’, ‘SSE_KMS’, ‘CSE_KMS’.

  • kms_key (str, optional) – For SSE-KMS and CSE-KMS , this is the KMS key ARN or ID.

  • params (Dict[str, any], optional) – Dict of parameters that will be used for constructing the SQL query. Only named parameters are supported. The dict needs to contain the information in the form {‘name’: ‘value’} and the SQL query needs to contain :name. Note that for varchar columns and similar, you must surround the value in single quotes.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • athena_cache_settings (AthenaCacheSettings, optional) – Parameters of the Athena cache settings such as max_cache_seconds, max_cache_query_inspections, max_remote_cache_entries, and max_local_cache_entries. AthenaCacheSettings is a TypedDict, meaning the passed parameter can be instantiated either as an instance of AthenaCacheSettings or as a regular Python dict. If cached results are valid, awswrangler ignores the ctas_approach, s3_output, encryption, kms_key, keep_files and ctas_temp_table_name params. If reading cached data fails for any reason, execution falls back to the usual query run path.

  • athena_query_wait_polling_delay (float, default: 0.25 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.

  • data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.

  • wait (bool, default False) – Indicates whether to wait for the query to finish and return a dictionary with the query execution response.

Returns:

Query execution ID if wait is set to False, dictionary with the get_query_execution response otherwise.

Return type:

Union[str, Dict[str, Any]]

Examples

Querying into the default data source (Amazon s3 - ‘AwsDataCatalog’)

>>> import awswrangler as wr
>>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...')

Querying into another data source (PostgreSQL, Redshift, etc)

>>> import awswrangler as wr
>>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...', data_source='...')