awswrangler.lakeformation.read_sql_query¶
- awswrangler.lakeformation.read_sql_query(sql: str, database: str, transaction_id: str | None = None, query_as_of_time: str | None = None, catalog_id: str | None = None, categories: List[str] | None = None, safe: bool = True, map_types: bool = True, use_threads: bool = True, boto3_session: Session | None = None, params: Dict[str, Any] | None = None) Any ¶
Execute PartiQL query on AWS Glue Table (Transaction ID or time travel timestamp). Return Pandas DataFrame.
Note
ORDER BY operations are not honoured. i.e. sql=”SELECT * FROM my_table ORDER BY my_column” is NOT valid
Note
The database must NOT be explicitely defined in the PartiQL statement. i.e. sql=”SELECT * FROM my_table” is valid but sql=”SELECT * FROM my_db.my_table” is NOT valid
Note
Pass one of transaction_id or query_as_of_time, not both.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
catalog_id
database
Check out the Global Configurations Tutorial for details.
- Parameters:
sql (str) – partiQL query.
database (str) – AWS Glue database name
transaction_id (str, optional) – The ID of the transaction at which to read the table contents. Cannot be specified alongside query_as_of_time
query_as_of_time (str, optional) – The time as of when to read the table contents. Must be a valid Unix epoch timestamp. Cannot be specified alongside transaction_id
catalog_id (str, optional) – The ID of the Data Catalog from which to retrieve Databases. If none is provided, the AWS account ID is used by default.
categories (Optional[List[str]], optional) – List of columns names that should be returned as pandas.Categorical. Recommended for memory restricted environments.
safe (bool, default True) – For certain data types, a cast is needed in order to store the data in a pandas DataFrame or Series (e.g. timestamps are always stored as nanoseconds in pandas). This option controls whether it is a safe cast or not.
map_types (bool, default True) – True to convert pyarrow DataTypes to pandas ExtensionDtypes. It is used to override the default pandas type for conversion of built-in pyarrow types or in absence of pandas_metadata in the Table schema.
use_threads (bool) – True to enable concurrent requests, False to disable multiple threads. When enabled, os.cpu_count() is used as the max number of threads.
boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session is used if boto3_session receives None.
params (Dict[str, any], optional) – Dict of parameters used to format the partiQL query. Only named parameters are supported. The dict must contain the information in the form {“name”: “value”} and the SQL query must contain :name.
- Returns:
Pandas DataFrame.
- Return type:
pd.DataFrame
Examples
>>> import awswrangler as wr >>> df = wr.lakeformation.read_sql_query( ... sql="SELECT * FROM my_table;", ... database="my_db", ... catalog_id="111111111111" ... )
>>> import awswrangler as wr >>> df = wr.lakeformation.read_sql_query( ... sql="SELECT * FROM my_table LIMIT 10;", ... database="my_db", ... transaction_id="1b62811fa3e02c4e5fdbaa642b752030379c4a8a70da1f8732ce6ccca47afdc9" ... )
>>> import awswrangler as wr >>> df = wr.lakeformation.read_sql_query( ... sql="SELECT * FROM my_table WHERE name=:name; AND city=:city;", ... database="my_db", ... query_as_of_time="1611142914", ... params={"name": "'filtered_name'", "city": "'filtered_city'"} ... )