RayReadParquetSettings

class awswrangler.typing.RayReadParquetSettings

Bases: dict

Typed dictionary defining the settings for distributing reading calls using Ray.

Attributes

parallelism

bulk_read

True to enable a faster reading of a large number of Parquet files.

Attributes Documentation

parallelism: NotRequired[int]
bulk_read: NotRequired[bool]

True to enable a faster reading of a large number of Parquet files. Offers improved performance due to not gathering the file metadata in a single node. The drawback is that it does not offer schema resolution, so it should only be used when the Parquet files are all uniform.