awswrangler.cleanrooms.read_sql_query¶

awswrangler.cleanrooms.read_sql_query(sql: str, membership_id: str, output_bucket: str, output_prefix: str, keep_files: bool = True, params: Dict[str, Any] | None = None, chunksize: int | bool | None = None, use_threads: bool | int = True, boto3_session: Session | None = None, pyarrow_additional_kwargs: Dict[str, Any] | None = None) → Iterator[DataFrame] | DataFrame¶

Execute Clean Rooms Protected SQL query and return the results as a Pandas DataFrame.

Parameters:

sql (str) – SQL query
membership_id (str) – Membership ID
output_bucket (str) – S3 output bucket name
output_prefix (str) – S3 output prefix
keep_files (bool, optional) – Whether files in S3 output bucket/prefix are retained. ‘True’ by default
params (Dict[str, any], optional) – Dict of parameters used for constructing the SQL query. Only named parameters are supported. The dict must be in the form {‘name’: ‘value’} and the SQL query must contain :name. Note that for varchar columns and similar, you must surround the value in single quotes
chunksize (Union[int, bool], optional) – If passed, the data is split into an iterable of DataFrames (Memory friendly). If True an iterable of DataFrames is returned without guarantee of chunksize. If an INTEGER is passed, an iterable of DataFrames is returned with maximum rows equal to the received INTEGER
use_threads (Union[bool, int], optional) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() is used as the maximum number of threads. If integer is provided, specified number is used
boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used
pyarrow_additional_kwargs (Optional[Dict[str, Any]]) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}

Returns:

Pandas DataFrame or Generator of Pandas DataFrames if chunksize is provided.

Return type:

Union[Iterator[pd.DataFrame], pd.DataFrame]

Examples

>>> import awswrangler as wr
>>> df = wr.cleanrooms.read_sql_query(
>>>     sql='SELECT DISTINCT...',
>>>     membership_id='membership-id',
>>>     output_bucket='output-bucket',
>>>     output_prefix='output-prefix',
>>> )