awswrangler.s3.to_deltalake¶
- awswrangler.s3.to_deltalake(df: DataFrame, path: str, index: bool = False, mode: Literal['error', 'append', 'overwrite', 'ignore'] = 'append', dtype: dict[str, str] | None = None, partition_cols: list[str] | None = None, schema_mode: Literal['overwrite'] | None = None, lock_dynamodb_table: str | None = None, s3_allow_unsafe_rename: bool = False, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None) None ¶
Write a DataFrame to S3 as a DeltaLake table.
This function requires the deltalake package.
Warning
This API is experimental and may change in future AWS SDK for Pandas releases.
- Parameters:
df (
DataFrame
) – Pandas DataFramepath (
str
) – S3 path for a directory where the DeltaLake table will be stored.index (
bool
) – True to store the DataFrame index in file, otherwise False to ignore it.mode (
Literal
['error'
,'append'
,'overwrite'
,'ignore'
]) –append
(Default),overwrite
,ignore
,error
dtype (
dict
[str
,str
] |None
) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g.{'col name':'bigint', 'col2 name': 'int'})
partition_cols (
list
[str
] |None
) – List of columns to partition the table by. Only required when creating a new table.schema_mode (
Literal
['overwrite'
] |None
) – If set to “overwrite”, allows replacing the schema of the table. Set to “merge” to merge with existing schema.lock_dynamodb_table (
str
|None
) –DynamoDB table to use as a locking provider. A locking mechanism is needed to prevent unsafe concurrent writes to a delta lake directory when writing to S3. If you don’t want to use a locking mechanism, you can choose to set
s3_allow_unsafe_rename
to True.For information on how to set up the lock table, please check this page.
s3_allow_unsafe_rename (
bool
) – Allows using the default S3 backend without support for concurrent writers.boto3_session (
Session
|None
) – If None, the default boto3 session is used.pyarrow_additional_kwargs – Forwarded to the Delta Table class for the storage options of the S3 backend.
- Return type:
None
Examples
Writing a Pandas DataFrame into a DeltaLake table in S3.
>>> import awswrangler as wr >>> import pandas as pd >>> wr.s3.to_deltalake( ... df=pd.DataFrame({"col": [1, 2, 3]}), ... path="s3://bucket/prefix/", ... lock_dynamodb_table="my-lock-table", ... )
See also
deltalake.DeltaTable
Create a DeltaTable instance with the deltalake library.
deltalake.write_deltalake
Write to a DeltaLake table.