awswrangler.athena.get_query_results

awswrangler.athena.get_query_results(query_execution_id: str, use_threads: bool | int = True, boto3_session: Session | None = None, categories: List[str] | None = None, chunksize: int | bool | None = None, s3_additional_kwargs: Dict[str, Any] | None = None, pyarrow_additional_kwargs: Dict[str, Any] | None = None, athena_query_wait_polling_delay: float = 0.25) Any

Get AWS Athena SQL query results as a Pandas DataFrame.

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • athena_query_wait_polling_delay

  • chunksize

Check out the Global Configurations Tutorial for details.

Parameters:
  • query_execution_id (str) – SQL query’s execution_id on AWS Athena.

  • use_threads (bool, int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • categories (List[str], optional) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.

  • chunksize (Union[int, bool], optional) – If passed will split the data in a Iterable of DataFrames (Memory friendly). If True awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize. If an INTEGER is passed awswrangler will iterate on the data by number of rows igual the received INTEGER.

  • s3_additional_kwargs (Optional[Dict[str, Any]]) – Forwarded to botocore requests. e.g. s3_additional_kwargs={‘RequestPayer’: ‘requester’}

  • pyarrow_additional_kwargs (Optional[Dict[str, Any]]) – Forward to the ParquetFile class or converting an Arrow table to Pandas, currently only an “coerce_int96_timestamp_unit” or “timestamp_as_object” argument will be considered. If reading parquet files where you cannot convert a timestamp to pandas Timestamp[ns] consider setting timestamp_as_object=True, to allow for timestamp units larger than “ns”. If reading parquet data that still uses INT96 (like Athena outputs) you can use coerce_int96_timestamp_unit to specify what timestamp unit to encode INT96 to (by default this is “ns”, if you know the output parquet came from a system that encodes timestamp to a particular unit then set this to that same unit e.g. coerce_int96_timestamp_unit=”ms”).

  • athena_query_wait_polling_delay (float, default: 0.25 seconds) – Interval in seconds for how often the function will check if the Athena query has completed.

Returns:

Pandas DataFrame or Generator of Pandas DataFrames if chunksize is passed.

Return type:

Union[pd.DataFrame, Iterator[pd.DataFrame]]

Examples

>>> import awswrangler as wr
>>> res = wr.athena.get_query_results(
...     query_execution_id="cbae5b41-8103-4709-95bb-887f88edd4f2"
... )