awswrangler.athena.get_query_results¶

awswrangler.athena.get_query_results(query_execution_id: str, use_threads: bool | int = True, boto3_session: Session | None = None, categories: list[str] | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', chunksize: int | bool | None = None, s3_additional_kwargs: dict[str, Any] | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None, athena_query_wait_polling_delay: float = 1.0) → DataFrame | Iterator[DataFrame]¶

Get AWS Athena SQL query results as a Pandas DataFrame.

Parameters:

query_execution_id (str) – SQL query’s execution_id on AWS Athena.
use_threads (bool | int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.
boto3_session (Session | None) – The default boto3 session will be used if boto3_session receive None.
categories (list[str] | None) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.
dtype_backend (Literal['numpy_nullable', 'pyarrow']) –
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.

The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.
chunksize (int | bool | None) – If passed will split the data in a Iterable of DataFrames (Memory friendly). If True awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize. If an INTEGER is passed awswrangler will iterate on the data by number of rows equal the received INTEGER.
s3_additional_kwargs (dict[str, Any] | None) – Forwarded to botocore requests. e.g. s3_additional_kwargs={‘RequestPayer’: ‘requester’}
pyarrow_additional_kwargs (dict[str, Any] | None) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.
athena_query_wait_polling_delay (float) – Interval in seconds for how often the function will check if the Athena query has completed.

Return type:

DataFrame | Iterator[DataFrame]

Returns:

Pandas DataFrame or Generator of Pandas DataFrames if chunksize is passed.

Examples

>>> import awswrangler as wr
>>> res = wr.athena.get_query_results(
...     query_execution_id="cbae5b41-8103-4709-95bb-887f88edd4f2"
... )