awswrangler.athena.get_query_results¶
- awswrangler.athena.get_query_results(query_execution_id: str, use_threads: bool | int = True, boto3_session: Session | None = None, categories: list[str] | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', chunksize: int | bool | None = None, s3_additional_kwargs: dict[str, Any] | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None, athena_query_wait_polling_delay: float = 1.0) DataFrame | Iterator[DataFrame] ¶
Get AWS Athena SQL query results as a Pandas DataFrame.
Note
Following arguments are not supported in distributed mode with engine EngineEnum.RAY:
boto3_session
Note
This function has arguments which can be configured globally through wr.config or environment variables:
athena_query_wait_polling_delay
chunksize
dtype_backend
Check out the Global Configurations Tutorial for details.
- Parameters:
query_execution_id (
str
) – SQL query’s execution_id on AWS Athena.use_threads (
bool
|int
) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.boto3_session (
Session
|None
) – The default boto3 session will be used if boto3_session receiveNone
.categories (
list
[str
] |None
) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.dtype_backend (
Literal
['numpy_nullable'
,'pyarrow'
]) –Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.
The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.
chunksize (
int
|bool
|None
) – If passed will split the data in a Iterable of DataFrames (Memory friendly). If True awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize. If an INTEGER is passed awswrangler will iterate on the data by number of rows equal the received INTEGER.s3_additional_kwargs (
dict
[str
,Any
] |None
) – Forwarded to botocore requests. e.g. s3_additional_kwargs={‘RequestPayer’: ‘requester’}pyarrow_additional_kwargs (
dict
[str
,Any
] |None
) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.athena_query_wait_polling_delay (
float
) – Interval in seconds for how often the function will check if the Athena query has completed.
- Return type:
DataFrame
|Iterator
[DataFrame
]- Returns:
Pandas DataFrame or Generator of Pandas DataFrames if chunksize is passed.
Examples
>>> import awswrangler as wr >>> res = wr.athena.get_query_results( ... query_execution_id="cbae5b41-8103-4709-95bb-887f88edd4f2" ... )