awswrangler.redshift.read_sql_query

awswrangler.redshift.read_sql_query(sql: str, con: redshift_connector.Connection, index_col: str | list[str] | None = None, params: list[Any] | tuple[Any, ...] | dict[Any, Any] | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', chunksize: int | None = None, dtype: dict[str, pa.DataType] | None = None, safe: bool = True, timestamp_as_object: bool = False) pd.DataFrame | Iterator[pd.DataFrame]

Return a DataFrame corresponding to the result set of the query string.

Note

For large extractions (1K+ rows) consider the function wr.redshift.unload().

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • chunksize

  • dtype_backend

Check out the Global Configurations Tutorial for details.

Parameters:
  • sql (str) – SQL query.

  • con (redshift_connector.Connection) – Use redshift_connector.connect() to use ” “credentials directly or wr.redshift.connect() to fetch it from the Glue Catalog.

  • index_col (Union[str, List[str]], optional) – Column(s) to set as index(MultiIndex).

  • params (Union[List, Tuple, Dict], optional) – List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249’s paramstyle, is supported.

  • dtype_backend (str, optional) –

    Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.

    The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.

  • chunksize (int, optional) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • dtype (Dict[str, pyarrow.DataType], optional) – Specifying the datatype for columns. The keys should be the column names and the values should be the PyArrow types.

  • safe (bool) – Check for overflows or other unsafe data type conversions.

  • timestamp_as_object (bool) – Cast non-nanosecond timestamps (np.datetime64) to objects.

Returns:

Result as Pandas DataFrame(s).

Return type:

Union[pandas.DataFrame, Iterator[pandas.DataFrame]]

Examples

Reading from Redshift using a Glue Catalog Connections

>>> import awswrangler as wr
>>> con = wr.redshift.connect("MY_GLUE_CONNECTION")
>>> df = wr.redshift.read_sql_query(
...     sql="SELECT * FROM public.my_table",
...     con=con
... )
>>> con.close()