awswrangler.data_quality.create_ruleset

awswrangler.data_quality.create_ruleset(name: str, database: str, table: str, df_rules: DataFrame | None = None, dqdl_rules: str | None = None, description: str = '', client_token: str | None = None, boto3_session: Session | None = None) None

Create Data Quality ruleset.

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

Check out the Global Configurations Tutorial for details.

Parameters:
  • name (str) – Ruleset name.

  • database (str) – Glue database name.

  • table (str) – Glue table name.

  • df_rules (DataFrame | None) – Data frame with rule_type, parameter, and expression columns.

  • dqdl_rules (str | None) – Data Quality Definition Language definition.

  • description (str) – Ruleset description.

  • client_token (str | None) – Random id used for idempotency. Is automatically generated if not provided.

  • boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.

Return type:

None

Examples

>>> import awswrangler as wr
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]})
>>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table")
>>> wr.data_quality.create_ruleset(
...     name="ruleset",
...     database="database",
...     table="table",
...     dqdl_rules="Rules = [ RowCount between 1 and 3 ]",
... )
>>> import awswrangler as wr
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({"c0": [0, 1, 2], "c1": [0, 1, 2], "c2": [0, 0, 1]})
>>> df_rules = pd.DataFrame({
...        "rule_type": ["RowCount", "IsComplete", "Uniqueness"],
...        "parameter": [None, '"c0"', '"c0"'],
...        "expression": ["between 1 and 6", None, "> 0.95"],
... })
>>> wr.s3.to_parquet(df, path, dataset=True, database="database", table="table")
>>> wr.data_quality.create_ruleset(
...     name="ruleset",
...     database="database",
...     table="table",
...     df_rules=df_rules,
>>> )