awswrangler.data_quality.create_ruleset¶
- awswrangler.data_quality.create_ruleset(name: str, database: str, table: str, df_rules: DataFrame | None = None, dqdl_rules: str | None = None, description: str = '', client_token: str | None = None, boto3_session: Session | None = None) None ¶
Create Data Quality ruleset.
Note
This function has arguments which can be configured globally through wr.config or environment variables:
database
Check out the Global Configurations Tutorial for details.
- Parameters:
name (
str
) – Ruleset name.database (
str
) – Glue database name.table (
str
) – Glue table name.df_rules (
DataFrame
|None
) – Data frame with rule_type, parameter, and expression columns.dqdl_rules (
str
|None
) – Data Quality Definition Language definition.description (
str
) – Ruleset description.client_token (
str
|None
) – Random id used for idempotency. Is automatically generated if not provided.boto3_session (
Session
|None
) – The default boto3 session will be used if boto3_session isNone
.
- Return type:
None
Examples
>>> import awswrangler as wr >>> import pandas as pd >>> >>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]}) >>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table") >>> wr.data_quality.create_ruleset( ... name="ruleset", ... database="database", ... table="table", ... dqdl_rules="Rules = [ RowCount between 1 and 3 ]", ... )
>>> import awswrangler as wr >>> import pandas as pd >>> >>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]}) >>> df_rules = pd.DataFrame({ ... "rule_type": ["RowCount", "IsComplete", "Uniqueness"], ... "parameter": [None, '"c0"', '"c0"'], ... "expression": ["between 1 and 6", None, "> 0.95"], ... }) >>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table") >>> wr.data_quality.create_ruleset( ... name="ruleset", ... database="database", ... table="table", ... df_rules=df_rules, >>> )