awswrangler.athena.start_query_execution¶
- awswrangler.athena.start_query_execution(sql: str, database: str | None = None, s3_output: str | None = None, workgroup: str = 'primary', encryption: str | None = None, kms_key: str | None = None, params: dict[str, Any] | list[str] | None = None, paramstyle: Literal['qmark', 'named'] = 'named', boto3_session: Session | None = None, client_request_token: str | None = None, athena_cache_settings: AthenaCacheSettings | None = None, athena_query_wait_polling_delay: float = 1.0, data_source: str | None = None, wait: bool = False) str | dict[str, Any] ¶
Start a SQL Query against AWS Athena.
Note
Create the default Athena bucket if it doesn’t exist and s3_output is None. (E.g. s3://aws-athena-query-results-ACCOUNT-REGION/)
Note
This function has arguments which can be configured globally through wr.config or environment variables:
database
athena_cache_settings
athena_query_wait_polling_delay
workgroup
Check out the Global Configurations Tutorial for details.
- Parameters:
sql (
str
) – SQL query.database (
str
|None
) – AWS Glue/Athena database name.s3_output (
str
|None
) – AWS S3 path.workgroup (
str
) – Athena workgroup. Primary by default.encryption (
str
|None
) – None, ‘SSE_S3’, ‘SSE_KMS’, ‘CSE_KMS’.kms_key (
str
|None
) – For SSE-KMS and CSE-KMS , this is the KMS key ARN or ID.params (
dict
[str
,Any
] |list
[str
] |None
) –Parameters that will be used for constructing the SQL query. Only named or question mark parameters are supported. The parameter style needs to be specified in the
paramstyle
parameter.For
paramstyle="named"
, this value needs to be a dictionary. The dict needs to contain the information in the form{'name': 'value'}
and the SQL query needs to contain:name
. The formatter will be applied client-side in this scenario.For
paramstyle="qmark"
, this value needs to be a list of strings. The formatter will be applied server-side. The values are applied sequentially to the parameters in the query in the order in which the parameters occur.paramstyle (
Literal
['qmark'
,'named'
]) –Determines the style of
params
. Possible values are:named
qmark
boto3_session (
Session
|None
) – The default boto3 session will be used if boto3_session receiveNone
.client_request_token (
str
|None
) – A unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). If another StartQueryExecution request is received, the same response is returned and another query is not created. If a parameter has changed, for example, the QueryString , an error is returned. If you pass the same client_request_token value with different parameters the query fails with error message “Idempotent parameters do not match”. Use this only with ctas_approach=False and unload_approach=False and disabled cache.athena_cache_settings (
AthenaCacheSettings
|None
) – Parameters of the Athena cache settings such as max_cache_seconds, max_cache_query_inspections, max_remote_cache_entries, and max_local_cache_entries. AthenaCacheSettings is a TypedDict, meaning the passed parameter can be instantiated either as an instance of AthenaCacheSettings or as a regular Python dict. If cached results are valid, awswrangler ignores the ctas_approach, s3_output, encryption, kms_key, keep_files and ctas_temp_table_name params. If reading cached data fails for any reason, execution falls back to the usual query run path.athena_query_wait_polling_delay (
float
) – Interval in seconds for how often the function will check if the Athena query has completed.data_source (
str
|None
) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.wait (
bool
) – Indicates whether to wait for the query to finish and return a dictionary with the query execution response.
- Return type:
str
|dict
[str
,Any
]- Returns:
Query execution ID if wait is set to False, dictionary with the get_query_execution response otherwise.
Examples
Querying into the default data source (Amazon s3 - ‘AwsDataCatalog’)
>>> import awswrangler as wr >>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...')
Querying into another data source (PostgreSQL, Redshift, etc)
>>> import awswrangler as wr >>> query_exec_id = wr.athena.start_query_execution(sql='...', database='...', data_source='...')