RayReadParquetSettings

class awswrangler.typing.RayReadParquetSettings

Bases: RaySettings

Typed dictionary defining the settings for distributing reading calls using Ray.

Attributes

parallelism

override_num_blocks

bulk_read

True to enable a faster reading of a large number of Parquet files.

Attributes Documentation

parallelism: int
override_num_blocks: int
bulk_read: bool

True to enable a faster reading of a large number of Parquet files. Offers improved performance due to not gathering the file metadata in a single node. The drawback is that it does not offer schema resolution, so it should only be used when the Parquet files are all uniform.