awswrangler.athena.get_query_results

awswrangler.athena.get_query_results(query_execution_id: str, use_threads: bool | int = True, boto3_session: Session | None = None, categories: list[str] | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', chunksize: int | bool | None = None, s3_additional_kwargs: dict[str, Any] | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None, athena_query_wait_polling_delay: float = 1.0) DataFrame | Iterator[DataFrame]

Get AWS Athena SQL query results as a Pandas DataFrame.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

  • boto3_session

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • athena_query_wait_polling_delay

  • chunksize

  • dtype_backend

Check out the Global Configurations Tutorial for details.

Parameters:
  • query_execution_id (str) – SQL query’s execution_id on AWS Athena.

  • use_threads (bool, int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • categories (List[str], optional) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.

  • dtype_backend (str, optional) –

    Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.

    The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.

  • chunksize (Union[int, bool], optional) – If passed will split the data in a Iterable of DataFrames (Memory friendly). If True awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize. If an INTEGER is passed awswrangler will iterate on the data by number of rows equal the received INTEGER.

  • s3_additional_kwargs (dict[str, Any], optional) – Forwarded to botocore requests. e.g. s3_additional_kwargs={‘RequestPayer’: ‘requester’}

  • pyarrow_additional_kwargs (dict[str, Any]], optional) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.

  • athena_query_wait_polling_delay (float, default: 0.25 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.

Returns:

Pandas DataFrame or Generator of Pandas DataFrames if chunksize is passed.

Return type:

Union[pd.DataFrame, Iterator[pd.DataFrame]]

Examples

>>> import awswrangler as wr
>>> res = wr.athena.get_query_results(
...     query_execution_id="cbae5b41-8103-4709-95bb-887f88edd4f2"
... )