awswrangler.postgresql.to_sql¶
- awswrangler.postgresql.to_sql(df: DataFrame, con: pg8000.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert'] = 'append', overwrite_method: Literal['drop', 'cascade', 'truncate', 'truncate cascade'] = 'drop', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, upsert_conflict_columns: list[str] | None = None, insert_conflict_columns: list[str] | None = None, commit_transaction: bool = True) None ¶
Write records stored in a DataFrame into PostgreSQL.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
chunksize
Check out the Global Configurations Tutorial for details.
- Parameters:
df (
DataFrame
) – Pandas DataFramecon (
Connection
) – Usepg8000.connect()
to use credentials directly orwr.postgresql.connect()
to fetch it from the Glue Catalog.table (
str
) – Table nameschema (
str
) – Schema namemode (
Literal
['append'
,'overwrite'
,'upsert'
]) –Append, overwrite or upsert.
append: Inserts new records into table.
overwrite: Drops table and recreates.
upsert: Perform an upsert which checks for conflicts on columns given by
upsert_conflict_columns
and sets the new values on conflicts. Note thatupsert_conflict_columns
is required for this mode.
overwrite_method (
Literal
['drop'
,'cascade'
,'truncate'
,'truncate cascade'
]) –Drop, cascade, truncate, or truncate cascade. Only applicable in overwrite mode.
”drop” -
DROP ... RESTRICT
- drops the table. Fails if there are any views that depend on it.”cascade” -
DROP ... CASCADE
- drops the table, and all views that depend on it.”truncate” -
TRUNCATE ... RESTRICT
- truncates the table. Fails if any of the tables have foreign-key references from tables that are not listed in the command.”truncate cascade” -
TRUNCATE ... CASCADE
- truncates the table, and all tables that have foreign-key references to any of the named tables.
index (
bool
) – True to store the DataFrame index as a column in the table, otherwise False to ignore it.dtype (
dict
[str
,str
] |None
) – Dictionary of columns names and PostgreSQL types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g.{'col name': 'TEXT', 'col2 name': 'FLOAT'}
)varchar_lengths (
dict
[str
,int
] |None
) – Dict of VARCHAR length by columns. (e.g.{"col1": 10, "col5": 200}
).use_column_names (
bool
) –If set to True, will use the column names of the DataFrame for generating the INSERT SQL Query.
E.g. If the DataFrame has two columns col1 and col3 and use_column_names is True, data will only be inserted into the database columns col1 and col3.
chunksize (
int
) – Number of rows which are inserted with each SQL query. Defaults to inserting 200 rows per query.upsert_conflict_columns (
list
[str
] |None
) – This parameter is only supported if mode is set top upsert. In this case conflicts for the given columns are checked for evaluating the upsert.insert_conflict_columns (
list
[str
] |None
) – This parameter is only supported if mode is set top append. In this case conflicts for the given columns are checked for evaluating the insert ‘ON CONFLICT DO NOTHING’.commit_transaction (
bool
) – Whether to commit the transaction. True by default.
- Return type:
None
Examples
Writing to PostgreSQL using a Glue Catalog Connections
>>> import awswrangler as wr >>> with wr.postgresql.connect("MY_GLUE_CONNECTION") as con: ... wr.postgresql.to_sql( ... df=df, ... table="my_table", ... schema="public", ... con=con ... )