awswrangler.cleanrooms.read_sql_query

awswrangler.cleanrooms.read_sql_query(sql: str, membership_id: str, output_bucket: str, output_prefix: str, keep_files: bool = True, params: Dict[str, Any] | None = None, chunksize: int | bool | None = None, use_threads: bool | int = True, boto3_session: Session | None = None, pyarrow_additional_kwargs: Dict[str, Any] | None = None) Iterator[DataFrame] | DataFrame

Execute Clean Rooms Protected SQL query and return the results as a Pandas DataFrame.

Parameters:
  • sql (str) – SQL query

  • membership_id (str) – Membership ID

  • output_bucket (str) – S3 output bucket name

  • output_prefix (str) – S3 output prefix

  • keep_files (bool, optional) – Whether files in S3 output bucket/prefix are retained. ‘True’ by default

  • params (Dict[str, any], optional) – Dict of parameters used for constructing the SQL query. Only named parameters are supported. The dict must be in the form {‘name’: ‘value’} and the SQL query must contain :name. Note that for varchar columns and similar, you must surround the value in single quotes

  • chunksize (Union[int, bool], optional) – If passed, the data is split into an iterable of DataFrames (Memory friendly). If True an iterable of DataFrames is returned without guarantee of chunksize. If an INTEGER is passed, an iterable of DataFrames is returned with maximum rows equal to the received INTEGER

  • use_threads (Union[bool, int], optional) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() is used as the maximum number of threads. If integer is provided, specified number is used

  • boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used

  • pyarrow_additional_kwargs (Optional[Dict[str, Any]]) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}

Returns:

Pandas DataFrame or Generator of Pandas DataFrames if chunksize is provided.

Return type:

Union[Iterator[pd.DataFrame], pd.DataFrame]

Examples

>>> import awswrangler as wr
>>> df = wr.cleanrooms.read_sql_query(
>>>     sql='SELECT DISTINCT...',
>>>     membership_id='membership-id',
>>>     output_bucket='output-bucket',
>>>     output_prefix='output-prefix',
>>> )