awswrangler.mysql.to_sql

awswrangler.mysql.to_sql(df: DataFrame, con: pymysql.connections.Connection, table: str, schema: str, mode: Literal['append', 'overwrite', 'upsert_replace_into', 'upsert_duplicate_key', 'upsert_distinct', 'ignore'] = 'append', index: bool = False, dtype: dict[str, str] | None = None, varchar_lengths: dict[str, int] | None = None, use_column_names: bool = False, chunksize: int = 200, cursorclass: type['Cursor'] | None = None) None

Write records stored in a DataFrame into MySQL.

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • chunksize

Check out the Global Configurations Tutorial for details.

Parameters:
  • df (DataFrame) – Pandas DataFrame

  • con (Connection) – Use pymysql.connect() to use credentials directly or wr.mysql.connect() to fetch it from the Glue Catalog.

  • table (str) – Table name

  • schema (str) – Schema name

  • mode (Literal['append', 'overwrite', 'upsert_replace_into', 'upsert_duplicate_key', 'upsert_distinct', 'ignore']) –

    Supports the following modes:

    • append: Inserts new records into table.

    • overwrite: Drops table and recreates.

    • upsert_duplicate_key: Performs an upsert using ON DUPLICATE KEY clause. Requires table schema to have defined keys, otherwise duplicate records will be inserted.

    • upsert_replace_into: Performs upsert using REPLACE INTO clause. Less efficient and still requires the table schema to have keys or else duplicate records will be inserted

    • upsert_distinct: Inserts new records, including duplicates, then recreates the table and inserts DISTINCT records from old table. This is the least efficient approach but handles scenarios where there are no keys on table.

    • ignore: Inserts new records into table using INSERT IGNORE clause.

  • index (bool) – True to store the DataFrame index as a column in the table, otherwise False to ignore it.

  • dtype (dict[str, str] | None) – Dictionary of columns names and MySQL types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘TEXT’, ‘col2 name’: ‘FLOAT’})

  • varchar_lengths (dict[str, int] | None) – Dict of VARCHAR length by columns. (e.g. {“col1”: 10, “col5”: 200}).

  • use_column_names (bool) – If set to True, will use the column names of the DataFrame for generating the INSERT SQL Query. E.g. If the DataFrame has two columns col1 and col3 and use_column_names is True, data will only be inserted into the database columns col1 and col3.

  • chunksize (int) – Number of rows which are inserted with each SQL query. Defaults to inserting 200 rows per query.

  • cursorclass (type[Cursor] | None) – Cursor class to use, e.g. SSCrusor; defaults to pymysql.cursors.Cursor https://pymysql.readthedocs.io/en/latest/modules/cursors.html

Return type:

None

Examples

Writing to MySQL using a Glue Catalog Connections

>>> import awswrangler as wr
>>> with wr.mysql.connect("MY_GLUE_CONNECTION") as con:
...     wr.mysql.to_sql(
...         df=df,
...         table="my_table",
...         schema="test",
...         con=con
...     )