awswrangler.s3.to_deltalake

awswrangler.s3.to_deltalake(df: DataFrame, path: str, index: bool = False, mode: Literal['error', 'append', 'overwrite', 'ignore'] = 'append', dtype: dict[str, str] | None = None, partition_cols: list[str] | None = None, overwrite_schema: bool = False, lock_dynamodb_table: str | None = None, s3_allow_unsafe_rename: bool = False, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None) None

Write a DataFrame to S3 as a DeltaLake table.

This function requires the deltalake package.

Warning

This API is experimental and may change in future AWS SDK for Pandas releases.

Parameters:
  • df (pandas.DataFrame) – Pandas DataFrame

  • path (str) – S3 path for a directory where the DeltaLake table will be stored.

  • index (bool) – True to store the DataFrame index in file, otherwise False to ignore it.

  • mode (str, optional) – append (Default), overwrite, ignore, error

  • dtype (dict[str, str], optional) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {'col name':'bigint', 'col2 name': 'int'})

  • partition_cols (list[str], optional) – List of columns to partition the table by. Only required when creating a new table.

  • overwrite_schema (bool) – If True, allows updating the schema of the table.

  • lock_dynamodb_table (str | None) –

    DynamoDB table to use as a locking provider. A locking mechanism is needed to prevent unsafe concurrent writes to a delta lake directory when writing to S3. If you don’t want to use a locking mechanism, you can choose to set s3_allow_unsafe_rename to True.

    For information on how to set up the lock table, please check this page.

  • s3_allow_unsafe_rename (bool) – Allows using the default S3 backend without support for concurrent writers.

  • boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used.

  • pyarrow_additional_kwargs (dict[str, Any], optional) – Forwarded to the Delta Table class for the storage options of the S3 backend.

Examples

Writing a Pandas DataFrame into a DeltaLake table in S3.

>>> import awswrangler as wr
>>> import pandas as pd
>>> wr.s3.to_deltalake(
...     df=pd.DataFrame({"col": [1, 2, 3]}),
...     path="s3://bucket/prefix/",
...     lock_dynamodb_table="my-lock-table",
... )

See also

deltalake.DeltaTable

Create a DeltaTable instance with the deltalake library.

deltalake.write_deltalake

Write to a DeltaLake table.