awswrangler.s3.from_iceberg¶

awswrangler.s3.from_iceberg(table_bucket_arn: str, namespace: str, table_name: str, columns: list[str] | None = None, row_filter: str | None = None, snapshot_id: int | None = None, limit: int | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', boto3_session: Session | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) → DataFrame¶

Read an S3 Table into a Pandas DataFrame via PyIceberg.

This function requires the pyiceberg package. Install it with pip install awswrangler[pyiceberg].

By default, the S3 Tables REST endpoint is used. To use the AWS Glue Iceberg REST endpoint instead, set wr.config.s3tables_catalog_endpoint_url (e.g. "https://glue.<region>.amazonaws.com/iceberg"). See Integrating S3 Tables with AWS analytics services for the required Glue Data Catalog and Lake Formation setup.

Parameters:

table_bucket_arn (str) – The ARN of the S3 table bucket.
namespace (str) – The namespace of the table.
table_name (str) – The name of the table to read.
columns (list[str], optional) – List of column names to read. If None, all columns are read.
row_filter (str, optional) – A row filter expression (e.g. "col > 5"). If None, all rows are read.
snapshot_id (int, optional) – A specific snapshot ID to read. If None, the latest snapshot is read.
limit (int, optional) – Maximum number of rows to return. If None, all rows are returned.
dtype_backend (str, optional) – Which dtype_backend to use. "numpy_nullable" or "pyarrow".
boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used.
pyarrow_additional_kwargs (dict[str, Any], optional) – Additional keyword arguments forwarded to PyArrow’s to_pandas() method.

Returns:

DataFrame with the table data.

Return type:

pd.DataFrame

Examples

Reading an entire table:

>>> import awswrangler as wr
>>> df = wr.s3.from_iceberg(
...     table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket",
...     namespace="my_namespace",
...     table_name="my_table",
... )

Reading with row filtering and limit:

>>> df = wr.s3.from_iceberg(
...     table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket",
...     namespace="my_namespace",
...     table_name="my_table",
...     row_filter="amount > 50.0",
...     limit=100,
... )

Reading via the Glue Iceberg REST endpoint:

>>> wr.config.s3tables_catalog_endpoint_url = "https://glue.us-east-1.amazonaws.com/iceberg"
>>> df = wr.s3.from_iceberg(
...     table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket",
...     namespace="my_namespace",
...     table_name="my_table",
... )