awswrangler.athena.delete_from_iceberg_table

awswrangler.athena.delete_from_iceberg_table(df: DataFrame, database: str, table: str, merge_cols: list[str], temp_path: str | None = None, keep_files: bool = True, data_source: str | None = None, s3_output: str | None = None, workgroup: str = 'primary', encryption: str | None = None, kms_key: str | None = None, dtype: dict[str, str] | None = None, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, Any] | None = None, catalog_id: str | None = None) None

Delete rows from an Iceberg table.

Creates temporary external table, writes staged files and then deletes any rows which match the contents of the temporary table.

Parameters:
  • df (DataFrame) – Pandas DataFrame containing the IDs of rows that are to be deleted from the Iceberg table.

  • database (str) – Database name.

  • table (str) – Table name.

  • merge_cols (list[str]) –

    List of columns to be used to determine which rows of the Iceberg table should be deleted.

    MERGE INTO

  • temp_path (str | None) – S3 path to temporarily store the DataFrame.

  • keep_files (bool) – Whether staging files produced by Athena are retained. True by default.

  • data_source (str | None) – The AWS KMS key ID or alias used to encrypt the data.

  • s3_output (str | None) – Amazon S3 path used for query execution.

  • workgroup (str) – Athena workgroup name.

  • encryption (str | None) – Valid values: [None, "SSE_S3", "SSE_KMS"]. Notice: "CSE_KMS" is not supported.

  • kms_key (str | None) – For SSE-KMS, this is the KMS key ARN or ID.

  • dtype (dict[str, str] | None) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘bigint’, ‘col2 name’: ‘int’})

  • boto3_session (Session | None) – The default boto3 session will be used if boto3_session receive None.

  • s3_additional_kwargs (dict[str, Any] | None) – Forwarded to botocore requests. e.g. `s3_additional_kwargs={"RequestPayer": "requester"}`

  • catalog_id (str | None) – The ID of the Data Catalog which contains the database and table. If none is provided, the AWS account ID is used by default.

Return type:

None

Examples

>>> import awswrangler as wr
>>> import pandas as pd
>>> df = pd.DataFrame({"id": [1, 2, 3], "col": ["foo", "bar", "baz"]})
>>> wr.athena.to_iceberg(
...     df=df,
...     database="my_database",
...     table="my_table",
...     temp_path="s3://bucket/temp/",
... )
>>> df_delete = pd.DataFrame({"id": [1, 3]})
>>> wr.athena.delete_from_iceberg_table(
...     df=df_delete,
...     database="my_database",
...     table="my_table",
...     merge_cols=["id"],
... )
>>> wr.athena.read_sql_table(table="my_table", database="my_database")
    id  col
0   2   bar