awswrangler.data_quality.create_recommendation_ruleset¶
- awswrangler.data_quality.create_recommendation_ruleset(database: str, table: str, iam_role_arn: str, name: str | None = None, catalog_id: str | None = None, connection_name: str | None = None, additional_options: dict[str, Any] | None = None, number_of_workers: int = 5, timeout: int = 2880, client_token: str | None = None, boto3_session: Session | None = None) DataFrame¶
Create recommendation Data Quality ruleset.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
catalog_id
database
Check out the Global Configurations Tutorial for details.
- Parameters:
database (
str) – Glue database name.table (
str) – Glue table name.iam_role_arn (
str) – IAM Role ARN.name (
str|None) – Ruleset name.catalog_id (
str|None) – Glue Catalog id.connection_name (
str|None) – Glue connection name.additional_options (
dict[str,Any] |None) –Additional options for the table. Supported keys:
pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset.
catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
number_of_workers (
int) – The number of G.1X workers to be used in the run. The default is 5.timeout (
int) – The timeout for a run in minutes. The default is 2880 (48 hours).client_token (
str|None) – Random id used for idempotency. Is automatically generated if not provided.boto3_session (
Session|None) – The default boto3 session will be used if boto3_session isNone.
- Return type:
DataFrame- Returns:
Data frame with recommended ruleset details.
Examples
>>> import awswrangler as wr >>> df_recommended_ruleset = wr.data_quality.create_recommendation_ruleset( ... database="database", ... table="table", ... iam_role_arn="arn:...", ... )