awswrangler.cleanrooms.read_sql_query

awswrangler.cleanrooms.read_sql_query(sql: str | None = None, analysis_template_arn: str | None = None, membership_id: str = '', output_bucket: str = '', output_prefix: str = '', keep_files: bool = True, params: dict[str, Any] | None = None, chunksize: int | bool | None = None, use_threads: bool | int = True, boto3_session: Session | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) Iterator[DataFrame] | DataFrame

Execute Clean Rooms Protected SQL query and return the results as a Pandas DataFrame.

Note

One of sql or analysis_template_arn must be supplied, not both.

Parameters:
  • sql (str, optional) – SQL query

  • analysis_template_arn (str, optional) – ARN of the analysis template

  • membership_id (str) – Membership ID

  • output_bucket (str) – S3 output bucket name

  • output_prefix (str) – S3 output prefix

  • keep_files (bool, optional) – Whether files in S3 output bucket/prefix are retained. ‘True’ by default

  • params (Dict[str, any], optional) –

    (Client-side) If used in combination with the sql parameter, it’s the Dict of parameters used for constructing the SQL query. Only named parameters are supported. The dict must be in the form {‘name’: ‘value’} and the SQL query must contain :name. Note that for varchar columns and similar, you must surround the value in single quotes.

    (Server-side) If used in combination with the analysis_template_arn parameter, it’s the Dict of parameters supplied with the analysis template. It must be a string to string dict in the form {‘name’: ‘value’}.

  • chunksize (Union[int, bool], optional) – If passed, the data is split into an iterable of DataFrames (Memory friendly). If True an iterable of DataFrames is returned without guarantee of chunksize. If an INTEGER is passed, an iterable of DataFrames is returned with maximum rows equal to the received INTEGER

  • use_threads (Union[bool, int], optional) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() is used as the maximum number of threads. If integer is provided, specified number is used

  • boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used

  • pyarrow_additional_kwargs (Optional[Dict[str, Any]]) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}

Returns:

Pandas DataFrame or Generator of Pandas DataFrames if chunksize is provided.

Return type:

Union[Iterator[pd.DataFrame], pd.DataFrame]

Examples

>>> import awswrangler as wr
>>> df = wr.cleanrooms.read_sql_query(
>>>     sql='SELECT DISTINCT...',
>>>     membership_id='membership-id',
>>>     output_bucket='output-bucket',
>>>     output_prefix='output-prefix',
>>> )
>>> import awswrangler as wr
>>> df = wr.cleanrooms.read_sql_query(
>>>     analysis_template_arn='arn:aws:cleanrooms:...',
>>>     params={'param1': 'value1'},
>>>     membership_id='membership-id',
>>>     output_bucket='output-bucket',
>>>     output_prefix='output-prefix',
>>> )