awswrangler.s3.read_deltalake¶
- awswrangler.s3.read_deltalake(path: str, version: int | None = None, partitions: list[tuple[str, str, Any]] | None = None, columns: list[str] | None = None, without_files: bool = False, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', use_threads: bool = True, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) DataFrame ¶
Load a Deltalake table data from an S3 path.
This function requires the deltalake package. See the How to load a Delta table guide for loading instructions.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
dtype_backend
Check out the Global Configurations Tutorial for details.
- Parameters:
path (
str
) – The path of the DeltaTable.version (
int
|None
) – The version of the DeltaTable.partitions (
list
[tuple
[str
,str
,Any
]] |None
) – A list of partition filters, see help(DeltaTable.files_by_partitions) for filter syntax.columns (
list
[str
] |None
) – The columns to project. This can be a list of column names to include (order and duplicates are preserved).without_files (
bool
) – If True, load the table without tracking files (memory-friendly). Some append-only applications might not need to track files.dtype_backend (
Literal
['numpy_nullable'
,'pyarrow'
]) –Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.
The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.
use_threads (
bool
) – True to enable concurrent requests, False to disable multiple threads. When enabled, os.cpu_count() is used as the max number of threads.boto3_session (
Session
|None
) – Boto3 Session. If None, the default boto3 session is used.s3_additional_kwargs (
dict
[str
,str
] |None
) – Forwarded to the Delta Table class for the storage options of the S3 backend.pyarrow_additional_kwargs (
dict
[str
,Any
] |None
) – Forwarded to the PyArrow to_pandas method.
- Return type:
DataFrame
- Returns:
DataFrame with the results.
See also
deltalake.DeltaTable
Create a DeltaTable instance with the deltalake library.