9 - Redshift - Append, Overwrite and Upsert¶
awswrangler’s copy/to_sql
function has three different mode
options for Redshift.
1 - append
2 - overwrite
3 - upsert
[ ]:
# Install the optional modules first
!pip install 'awswrangler[redshift]'
[2]:
from datetime import date
import pandas as pd
import awswrangler as wr
con = wr.redshift.connect("aws-sdk-pandas-redshift")
Enter your bucket name:¶
[3]:
import getpass
bucket = getpass.getpass()
path = f"s3://{bucket}/stage/"
···········································
Enter your IAM ROLE ARN:¶
[4]:
iam_role = getpass.getpass()
····················································································
Creating the table (Overwriting if it exists)¶
[10]:
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"], "date": [date(2020, 1, 1), date(2020, 1, 2)]})
wr.redshift.copy(
df=df,
path=path,
con=con,
schema="public",
table="my_table",
mode="overwrite",
iam_role=iam_role,
primary_keys=["id"],
)
wr.redshift.read_sql_table(table="my_table", schema="public", con=con)
[10]:
id | value | date | |
---|---|---|---|
0 | 2 | boo | 2020-01-02 |
1 | 1 | foo | 2020-01-01 |
Appending¶
[11]:
df = pd.DataFrame({"id": [3], "value": ["bar"], "date": [date(2020, 1, 3)]})
wr.redshift.copy(
df=df, path=path, con=con, schema="public", table="my_table", mode="append", iam_role=iam_role, primary_keys=["id"]
)
wr.redshift.read_sql_table(table="my_table", schema="public", con=con)
[11]:
id | value | date | |
---|---|---|---|
0 | 1 | foo | 2020-01-01 |
1 | 2 | boo | 2020-01-02 |
2 | 3 | bar | 2020-01-03 |
Upserting¶
[12]:
df = pd.DataFrame({"id": [2, 3], "value": ["xoo", "bar"], "date": [date(2020, 1, 2), date(2020, 1, 3)]})
wr.redshift.copy(
df=df, path=path, con=con, schema="public", table="my_table", mode="upsert", iam_role=iam_role, primary_keys=["id"]
)
wr.redshift.read_sql_table(table="my_table", schema="public", con=con)
[12]:
id | value | date | |
---|---|---|---|
0 | 1 | foo | 2020-01-01 |
1 | 2 | xoo | 2020-01-02 |
2 | 3 | bar | 2020-01-03 |
Cleaning Up¶
[13]:
with con.cursor() as cursor:
cursor.execute("DROP TABLE public.my_table")
con.close()