awswrangler.redshift.to_sql

awswrangler.redshift.to_sql(df: DataFrame, con: redshift_connector.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', overwrite_method: Literal['drop', 'cascade', 'truncate', 'delete'] = 'drop', index: bool = False, dtype: dict[str, str] | None = None, diststyle: Literal['AUTO', 'EVEN', 'ALL', 'KEY'] = 'AUTO', distkey: str | None = None, sortstyle: Literal['COMPOUND', 'INTERLEAVED'] = 'COMPOUND', sortkey: list[str] | None = None, primary_keys: list[str] | None = None, varchar_lengths_default: int = 256, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, lock: bool = False, chunksize: int = 200, commit_transaction: bool = True, precombine_key: str | None = None, add_new_columns: bool = False) None

Write records stored in a DataFrame into Redshift.

Note

For large DataFrames (1K+ rows) consider the function wr.redshift.copy().

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • chunksize

Check out the Global Configurations Tutorial for details.

Parameters:
  • df (DataFrame) – Pandas DataFrame https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

  • con (Connection) – Use redshift_connector.connect() to use ” “credentials directly or wr.redshift.connect() to fetch it from the Glue Catalog.

  • table (str) – Table name

  • schema (str) – Schema name

  • mode (Literal['append', 'overwrite', 'upsert']) – Append, overwrite or upsert.

  • overwrite_method (Literal['drop', 'cascade', 'truncate', 'delete']) –

    Drop, cascade, truncate, or delete. Only applicable in overwrite mode.

    • ”drop” - DROP ... RESTRICT - drops the table. Fails if there are any views that depend on it.

    • ”cascade” - DROP ... CASCADE - drops the table, and all views that depend on it.

    • ”truncate” - TRUNCATE ... - truncates the table, but immediately commits current transaction & starts a new one, hence the overwrite happens in two transactions and is not atomic.

    • ”delete” - DELETE FROM ... - deletes all rows from the table. Slow relative to the other methods.

  • index (bool) – True to store the DataFrame index as a column in the table, otherwise False to ignore it.

  • dtype (dict[str, str] | None) – Dictionary of columns names and Redshift types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘VARCHAR(10)’, ‘col2 name’: ‘FLOAT’})

  • diststyle (Literal['AUTO', 'EVEN', 'ALL', 'KEY']) – Redshift distribution styles. Must be in [“AUTO”, “EVEN”, “ALL”, “KEY”]. https://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html

  • distkey (str | None) – Specifies a column name or positional number for the distribution key.

  • sortstyle (Literal['COMPOUND', 'INTERLEAVED']) – Sorting can be “COMPOUND” or “INTERLEAVED”. https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html

  • sortkey (list[str] | None) – List of columns to be sorted.

  • primary_keys (list[str] | None) – Primary keys.

  • varchar_lengths_default (int) – The size that will be set for all VARCHAR columns not specified with varchar_lengths.

  • varchar_lengths (dict[str, int] | None) – Dict of VARCHAR length by columns. (e.g. {“col1”: 10, “col5”: 200}).

  • use_column_names (bool) – If set to True, will use the column names of the DataFrame for generating the INSERT SQL Query. E.g. If the DataFrame has two columns col1 and col3 and use_column_names is True, data will only be inserted into the database columns col1 and col3.

  • lock (bool) – True to execute LOCK command inside the transaction to force serializable isolation.

  • chunksize (int) – Number of rows which are inserted with each SQL query. Defaults to inserting 200 rows per query.

  • commit_transaction (bool) – Whether to commit the transaction. True by default.

  • precombine_key (str | None) – When there is a primary_key match during upsert, this column will change the upsert method, comparing the values of the specified column from source and target, and keeping the larger of the two. Will only work when mode = upsert.

  • add_new_columns (bool) – If True, it automatically adds the new DataFrame columns into the target table.

Return type:

None

Examples

Writing to Redshift using a Glue Catalog Connections

>>> import awswrangler as wr
>>> with wr.redshift.connect("MY_GLUE_CONNECTION") as con"
...     wr.redshift.to_sql(
...         df=df,
...         table="my_table",
...         schema="public",
...         con=con,
...     )