awswrangler.s3.from_iceberg¶
- awswrangler.s3.from_iceberg(table_bucket_arn: str, namespace: str, table_name: str, columns: list[str] | None = None, row_filter: str | None = None, snapshot_id: int | None = None, limit: int | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', boto3_session: Session | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) DataFrame¶
Read an S3 Table into a Pandas DataFrame via PyIceberg.
This function requires the
pyicebergpackage. Install it withpip install awswrangler[pyiceberg].By default, the S3 Tables REST endpoint is used. To use the AWS Glue Iceberg REST endpoint instead, set
wr.config.s3tables_catalog_endpoint_url(e.g."https://glue.<region>.amazonaws.com/iceberg"). See Integrating S3 Tables with AWS analytics services for the required Glue Data Catalog and Lake Formation setup.- Parameters:
table_bucket_arn (str) – The ARN of the S3 table bucket.
namespace (str) – The namespace of the table.
table_name (str) – The name of the table to read.
columns (list[str], optional) – List of column names to read. If None, all columns are read.
row_filter (str, optional) – A row filter expression (e.g.
"col > 5"). If None, all rows are read.snapshot_id (int, optional) – A specific snapshot ID to read. If None, the latest snapshot is read.
limit (int, optional) – Maximum number of rows to return. If None, all rows are returned.
dtype_backend (str, optional) – Which dtype_backend to use.
"numpy_nullable"or"pyarrow".boto3_session (boto3.Session, optional) – Boto3 Session. If None, the default boto3 session is used.
pyarrow_additional_kwargs (dict[str, Any], optional) – Additional keyword arguments forwarded to PyArrow’s
to_pandas()method.
- Returns:
DataFrame with the table data.
- Return type:
pd.DataFrame
Examples
Reading an entire table:
>>> import awswrangler as wr >>> df = wr.s3.from_iceberg( ... table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket", ... namespace="my_namespace", ... table_name="my_table", ... )
Reading with row filtering and limit:
>>> df = wr.s3.from_iceberg( ... table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket", ... namespace="my_namespace", ... table_name="my_table", ... row_filter="amount > 50.0", ... limit=100, ... )
Reading via the Glue Iceberg REST endpoint:
>>> wr.config.s3tables_catalog_endpoint_url = "https://glue.us-east-1.amazonaws.com/iceberg" >>> df = wr.s3.from_iceberg( ... table_bucket_arn="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket", ... namespace="my_namespace", ... table_name="my_table", ... )