awswrangler.s3.to_iceberg

awswrangler.s3.to_iceberg(df: DataFrame, table_bucket_arn: str, namespace: str, table_name: str, mode: Literal['append', 'overwrite'] = 'append', index: bool = False, dtype: dict[str, str] | None = None, boto3_session: Session | None = None) None

Write a Pandas DataFrame to an S3 Table via PyIceberg.

If the table does not exist, it is automatically created with a schema inferred from the DataFrame.

This function requires the pyiceberg package. Install it with pip install awswrangler[pyiceberg].

By default, the S3 Tables REST endpoint is used. To use the AWS Glue Iceberg REST endpoint instead, set wr.config.s3tables_catalog_endpoint_url (e.g. "https://glue.<region>.amazonaws.com/iceberg"). See Integrating S3 Tables with AWS analytics services for the required Glue Data Catalog and Lake Formation setup.

Parameters:
  • df (pd.DataFrame) – Pandas DataFrame to write.

  • table_bucket_arn (str) – The ARN of the S3 table bucket.

  • namespace (str) – The namespace of the table.

  • table_name (str) – The name of the table to write to.

  • mode (str, optional) – Write mode. "append" (default) adds rows to the table. "overwrite" replaces all existing data.

  • index (bool, optional) – If True, include the DataFrame index as a column. Default is False.

  • dtype (dict[str, str], optional) – Dictionary of column names and Athena/Glue types to cast. (e.g. {"col_name": "bigint", "col2_name": "int"}).

  • boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used.

Return type:

None

Examples

>>> import awswrangler as wr
>>> import pandas as pd
>>> wr.s3.to_iceberg(
...     df=pd.DataFrame({"col": [1, 2, 3]}),
...     table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket",
...     namespace="my_namespace",
...     table_name="my_table",
... )