awswrangler.athena.to_iceberg¶

awswrangler.athena.to_iceberg(df: DataFrame, database: str, table: str, temp_path: str | None = None, index: bool = False, table_location: str | None = None, keep_files: bool = True, data_source: str | None = None, workgroup: str | None = None, encryption: str | None = None, kms_key: str | None = None, boto3_session: Session | None = None, s3_additional_kwargs: Dict[str, Any] | None = None) → None¶

Insert into Athena Iceberg table using INSERT INTO … SELECT. Will create Iceberg table if it does not exist.

Creates temporary external table, writes staged files and inserts via INSERT INTO … SELECT.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

boto3_session
s3_additional_kwargs

Note

This function has arguments which can be configured globally through wr.config or environment variables:

database
workgroup

Check out the Global Configurations Tutorial for details.

Parameters:

df (pd.DataFrame) – Pandas DataFrame.
database (str) – AWS Glue/Athena database name - It is only the origin database from where the query will be launched. You can still using and mixing several databases writing the full table name within the sql (e.g. database.table).
table (str) – AWS Glue/Athena table name.
temp_path (str) – Amazon S3 location to store temporary results. Workgroup config will be used if not provided.
index (bool) – Should consider the DataFrame index as a column?.
table_location (str, optional) – Amazon S3 location for the table. Will only be used to create a new table if it does not exist.
keep_files (bool) – Whether staging files produced by Athena are retained. ‘True’ by default.
data_source (str, optional) – Data Source / Catalog name. If None, ‘AwsDataCatalog’ will be used by default.
workgroup (str, optional) – Athena workgroup.
encryption (str, optional) – Valid values: [None, ‘SSE_S3’, ‘SSE_KMS’]. Notice: ‘CSE_KMS’ is not supported.
kms_key (str, optional) – For SSE-KMS, this is the KMS key ARN or ID.
boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.
s3_additional_kwargs (Optional[Dict[str, Any]]) – Forwarded to botocore requests. e.g. s3_additional_kwargs={‘RequestPayer’: ‘requester’}

Return type:

None

Examples

Insert into an existing Iceberg table

>>> import awswrangler as wr
>>> import pandas as pd
>>> wr.athena.to_iceberg(
...     df=pd.DataFrame({'col': [1, 2, 3]}),
...     database='my_database',
...     table='my_table',
...     temp_path='s3://bucket/temp/',
... )

Create Iceberg table and insert data (table doesn’t exist, requires table_location)

>>> import awswrangler as wr
>>> import pandas as pd
>>> wr.athena.to_iceberg(
...     df=pd.DataFrame({'col': [1, 2, 3]}),
...     database='my_database',
...     table='my_table2',
...     table_location='s3://bucket/my_table2/',
...     temp_path='s3://bucket/temp/',
... )