awswrangler.catalog.sanitize_dataframe_columns_names¶
- awswrangler.catalog.sanitize_dataframe_columns_names(df: DataFrame, handle_duplicate_columns: str | None = 'warn') DataFrame ¶
Normalize all columns names to be compatible with Amazon Athena.
https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html
Possible transformations: - Strip accents - Remove non alphanumeric characters
Note
After transformation, some column names might not be unique anymore. Example: the columns [“A”, “a”] will be sanitized to [“a”, “a”]
- Parameters:
df (
DataFrame
) – Original Pandas DataFrame.handle_duplicate_columns (
str
|None
) – How to handle duplicate columns. Can be “warn” or “drop” or “rename”. “drop” will drop all but the first duplicated column. “rename” will rename all duplicated columns with an incremental number. Defaults to “warn”.
- Return type:
DataFrame
- Returns:
Original Pandas DataFrame with columns names normalized.
Examples
>>> import awswrangler as wr >>> df_normalized = wr.catalog.sanitize_dataframe_columns_names(df=pd.DataFrame({"A": [1, 2]})) >>> df_normalized_drop = wr.catalog.sanitize_dataframe_columns_names( df=pd.DataFrame({"A": [1, 2], "a": [3, 4]}), handle_duplicate_columns="drop" ) >>> df_normalized_rename = wr.catalog.sanitize_dataframe_columns_names( df=pd.DataFrame({"A": [1, 2], "a": [3, 4], "a_1": [4, 6]}), handle_duplicate_columns="rename" )