awswrangler.data_quality.evaluate_ruleset¶
- awswrangler.data_quality.evaluate_ruleset(name: str | list[str], iam_role_arn: str, number_of_workers: int = 5, timeout: int = 2880, database: str | None = None, table: str | None = None, catalog_id: str | None = None, connection_name: str | None = None, additional_options: dict[str, str] | None = None, additional_run_options: dict[str, str | bool] | None = None, client_token: str | None = None, boto3_session: Session | None = None) DataFrame ¶
Evaluate Data Quality ruleset.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
catalog_id
database
Check out the Global Configurations Tutorial for details.
- Parameters:
name (
str
|list
[str
]) – Ruleset name or list of names.iam_role_arn (
str
) – IAM Role ARN.number_of_workers (
int
) – The number of G.1X workers to be used in the run. The default is 5.timeout (
int
) – The timeout for a run in minutes. The default is 2880 (48 hours).database (
str
|None
) – Glue database name. Database associated with the ruleset will be used if not provided.table (
str
|None
) – Glue table name. Table associated with the ruleset will be used if not provided.catalog_id (
str
|None
) – Glue Catalog id.connection_name (
str
|None
) – Glue connection name.additional_options (
dict
[str
,str
] |None
) – Additional options for the table. Supported keys: pushDownPredicate: to filter on partitions without having to list and read all the files in your dataset. catalogPartitionPredicate: to use server-side partition pruning using partition indexes in the Glue Data Catalog.additional_run_options (
dict
[str
,str
|bool
] |None
) –Additional run options. Supported keys:
CloudWatchMetricsEnabled: whether to enable CloudWatch metrics.
ResultsS3Prefix: prefix for Amazon S3 to store results.
client_token (
str
|None
) – Random id used for idempotency. Will be automatically generated if not provided.boto3_session (
Session
|None
) – The default boto3 session will be used if boto3_session isNone
.
- Return type:
DataFrame
- Returns:
Data frame with ruleset evaluation results.
Examples
>>> import awswrangler as wr >>> import pandas as pd >>> >>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]}) >>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table") >>> wr.data_quality.create_ruleset( ... name="ruleset", ... database="database", ... table="table", ... dqdl_rules="Rules = [ RowCount between 1 and 3 ]", ... ) >>> df_ruleset_results = wr.data_quality.evaluate_ruleset( ... name="ruleset", ... iam_role_arn=glue_data_quality_role, ... )