awswrangler.athena.to_iceberg

awswrangler.athena.to_iceberg(df: DataFrame, database: str, table: str, temp_path: str | None = None, index: bool = False, table_location: str | None = None, keep_files: bool = True, data_source: str | None = None, workgroup: str | None = None, encryption: str | None = None, kms_key: str | None = None, boto3_session: Session | None = None, s3_additional_kwargs: Dict[str, Any] | None = None) None

Insert into Athena Iceberg table using INSERT INTO … SELECT. Will create Iceberg table if it does not exist.

Creates temporary external table, writes staged files and inserts via INSERT INTO … SELECT.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

  • boto3_session

  • s3_additional_kwargs

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • workgroup

Check out the Global Configurations Tutorial for details.

Parameters:
  • df (pd.DataFrame) – Pandas DataFrame.

  • database (str) – AWS Glue/Athena database name - It is only the origin database from where the query will be launched. You can still using and mixing several databases writing the full table name within the sql (e.g. database.table).

  • table (str) – AWS Glue/Athena table name.

  • temp_path (str) – Amazon S3 location to store temporary results. Workgroup config will be used if not provided.

  • index (bool) – Should consider the DataFrame index as a column?.

  • table_location (str, optional) – Amazon S3 location for the table. Will only be used to create a new table if it does not exist.

  • keep_files (bool) – Whether staging files produced by Athena are retained. ‘True’ by default.

  • data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.

  • workgroup (str, optional) – Athena workgroup.

  • encryption (str, optional) – Valid values: [None, ‘SSE_S3’, ‘SSE_KMS’]. Notice: ‘CSE_KMS’ is not supported.

  • kms_key (str, optional) – For SSE-KMS, this is the KMS key ARN or ID.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • s3_additional_kwargs (Optional[Dict[str, Any]]) – Forwarded to botocore requests. e.g. s3_additional_kwargs={‘RequestPayer’: ‘requester’}

Return type:

None

Examples

Insert into an existing Iceberg table

>>> import awswrangler as wr
>>> import pandas as pd
>>> wr.athena.to_iceberg(
...     df=pd.DataFrame({'col': [1, 2, 3]}),
...     database='my_database',
...     table='my_table',
...     temp_path='s3://bucket/temp/',
... )

Create Iceberg table and insert data (table doesn’t exist, requires table_location)

>>> import awswrangler as wr
>>> import pandas as pd
>>> wr.athena.to_iceberg(
...     df=pd.DataFrame({'col': [1, 2, 3]}),
...     database='my_database',
...     table='my_table2',
...     table_location='s3://bucket/my_table2/',
...     temp_path='s3://bucket/temp/',
... )