awswrangler.data_quality.create_ruleset¶
- awswrangler.data_quality.create_ruleset(name: str, database: str, table: str, df_rules: DataFrame | None = None, dqdl_rules: str | None = None, description: str = '', client_token: str | None = None, boto3_session: Session | None = None) None ¶
Create Data Quality ruleset.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
database
Check out the Global Configurations Tutorial for details.
- Parameters:
name (str) – Ruleset name.
database (str) – Glue database name.
table (str) – Glue table name.
df_rules (str, optional) – Data frame with rule_type, parameter, and expression columns.
dqdl_rules (str, optional) – Data Quality Definition Language definition.
description (str) – Ruleset description.
client_token (str, optional) – Random id used for idempotency. Is automatically generated if not provided.
boto3_session (boto3.Session, optional) – Boto3 Session. If none, the default boto3 session is used.
Examples
>>> import awswrangler as wr >>> import pandas as pd >>> >>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]}) >>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table") >>> wr.data_quality.create_ruleset( >>> name="ruleset", >>> database="database", >>> table="table", >>> dqdl_rules="Rules = [ RowCount between 1 and 3 ]", >>>)
>>> import awswrangler as wr >>> import pandas as pd >>> >>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]}) >>> df_rules = pd.DataFrame({ >>> "rule_type": ["RowCount", "IsComplete", "Uniqueness"], >>> "parameter": [None, '"c0"', '"c0"'], >>> "expression": ["between 1 and 6", None, "> 0.95"], >>> }) >>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table") >>> wr.data_quality.create_ruleset( >>> name="ruleset", >>> database="database", >>> table="table", >>> df_rules=df_rules, >>>)