awswrangler.athena.start_query_execution

awswrangler.athena.start_query_execution(sql: str, database: str | None = None, s3_output: str | None = None, workgroup: str = 'primary', encryption: str | None = None, kms_key: str | None = None, params: dict[str, Any] | list[str] | None = None, paramstyle: Literal['qmark', 'named'] = 'named', result_reuse_configuration: dict[str, Any] | None = None, boto3_session: Session | None = None, client_request_token: str | None = None, athena_cache_settings: AthenaCacheSettings | None = None, athena_query_wait_polling_delay: float = 1.0, data_source: str | None = None, wait: bool = False) str | dict[str, Any]

Start a SQL Query against AWS Athena.

Note

Create the default Athena bucket if it doesn’t exist and s3_output is None. Not required when the workgroup uses managed query results. (E.g. s3://aws-athena-query-results-ACCOUNT-REGION/)

Parameters:
  • sql (str) – SQL query.

  • database (str | None) – AWS Glue/Athena database name.

  • s3_output (str | None) – AWS S3 path. Not required when the workgroup uses managed query results.

  • workgroup (str) – Athena workgroup. Primary by default.

  • encryption (str | None) – None, ‘SSE_S3’, ‘SSE_KMS’, ‘CSE_KMS’.

  • kms_key (str | None) – For SSE-KMS and CSE-KMS , this is the KMS key ARN or ID.

  • params (dict[str, Any] | list[str] | None) –

    Note

    This function has arguments which can be configured globally through wr.config or environment variables:

    • database

    • athena_cache_settings

    • athena_query_wait_polling_delay

    • workgroup

    Check out the Global Configurations Tutorial for details.

    Parameters that will be used for constructing the SQL query. Only named or question mark parameters are supported. The parameter style needs to be specified in the paramstyle parameter.

    For paramstyle="named", this value needs to be a dictionary. The dict needs to contain the information in the form {'name': 'value'} and the SQL query needs to contain :name. The formatter will be applied client-side in this scenario.

    For paramstyle="qmark", this value needs to be a list of strings. The formatter will be applied server-side. The values are applied sequentially to the parameters in the query in the order in which the parameters occur.

paramstyle

Determines the style of params. Possible values are:

  • named

  • qmark

result_reuse_configuration

A structure that contains the configuration settings for reusing query results. See also: https://docs.aws.amazon.com/athena/latest/ug/reusing-query-results.html

boto3_session

The default boto3 session will be used if boto3_session receive None.

client_request_token

A unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). If another StartQueryExecution request is received, the same response is returned and another query is not created. If a parameter has changed, for example, the QueryString , an error is returned. If you pass the same client_request_token value with different parameters the query fails with error message “Idempotent parameters do not match”. Use this only with ctas_approach=False and unload_approach=False and disabled cache.

athena_cache_settings

Parameters of the Athena cache settings such as max_cache_seconds, max_cache_query_inspections, max_remote_cache_entries, and max_local_cache_entries. AthenaCacheSettings is a TypedDict, meaning the passed parameter can be instantiated either as an instance of AthenaCacheSettings or as a regular Python dict. If cached results are valid, awswrangler ignores the ctas_approach, s3_output, encryption, kms_key, keep_files and ctas_temp_table_name params. If reading cached data fails for any reason, execution falls back to the usual query run path.

athena_query_wait_polling_delay

Interval in seconds for how often the function will check if the Athena query has completed.

data_source

Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.

wait

Indicates whether to wait for the query to finish and return a dictionary with the query execution response.

Return type:

str | dict[str, Any]

Returns:

Query execution ID if wait is set to False, dictionary with the get_query_execution response otherwise.

Examples

Querying into the default data source (Amazon s3 - ‘AwsDataCatalog’)

>>> import awswrangler as wr
>>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...')

Querying into another data source (PostgreSQL, Redshift, etc)

>>> import awswrangler as wr
>>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...', data_source='...')