awswrangler.lakeformation.read_sql_query

awswrangler.lakeformation.read_sql_query(sql: str, database: str, transaction_id: str | None = None, query_as_of_time: str | None = None, catalog_id: str | None = None, use_threads: bool = True, boto3_session: Session | None = None, params: Dict[str, Any] | None = None, pyarrow_additional_kwargs: Dict[str, Any] | None = None) DataFrame

Execute PartiQL query on AWS Glue Table (Transaction ID or time travel timestamp). Return Pandas DataFrame.

Note

ORDER BY operations are not honored. i.e. sql=”SELECT * FROM my_table ORDER BY my_column” is NOT valid

Note

The database must NOT be explicitly defined in the PartiQL statement. i.e. sql=”SELECT * FROM my_table” is valid but sql=”SELECT * FROM my_db.my_table” is NOT valid

Note

Pass one of transaction_id or query_as_of_time, not both.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

  • boto3_session

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • catalog_id

  • database

Check out the Global Configurations Tutorial for details.

Parameters:
  • sql (str) – partiQL query.

  • database (str) – AWS Glue database name.

  • transaction_id (str, optional) – The ID of the transaction at which to read the table contents. Cannot be specified alongside query_as_of_time.

  • query_as_of_time (str, optional) – The time as of when to read the table contents. Must be a valid Unix epoch timestamp. Cannot be specified alongside transaction_id.

  • catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.

  • use_threads (bool) – True to enable concurrent requests, False to disable multiple threads. When enabled, os.cpu_count() is used as the max number of threads.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session is used if boto3_session receives None.

  • params (Dict[str, any], optional) – Dict of parameters used to format the partiQL query. Only named parameters are supported. The dict must contain the information in the form {“name”: “value”} and the SQL query must contain :name.

  • pyarrow_additional_kwargs (Dict[str, Any], optional) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.

Returns:

Pandas DataFrame.

Return type:

pd.DataFrame

Examples

>>> import awswrangler as wr
>>> df = wr.lakeformation.read_sql_query(
...     sql="SELECT * FROM my_table;",
...     database="my_db",
...     catalog_id="111111111111"
... )
>>> import awswrangler as wr
>>> df = wr.lakeformation.read_sql_query(
...     sql="SELECT * FROM my_table LIMIT 10;",
...     database="my_db",
...     transaction_id="1b62811fa3e02c4e5fdbaa642b752030379c4a8a70da1f8732ce6ccca47afdc9"
... )
>>> import awswrangler as wr
>>> df = wr.lakeformation.read_sql_query(
...     sql="SELECT * FROM my_table WHERE name=:name AND city=:city",
...     database="my_db",
...     query_as_of_time="1611142914",
...     params={"name": "filtered_name", "city": "filtered_city"}
... )