awswrangler.catalog.extract_athena_types¶

awswrangler.catalog.extract_athena_types(df: DataFrame, index: bool = False, partition_cols: list[str] | None = None, dtype: dict[str, str] | None = None, file_format: str = 'parquet') → tuple[dict[str, str], dict[str, str]]¶

Extract columns and partitions types (Amazon Athena) from Pandas DataFrame.

https://docs.aws.amazon.com/athena/latest/ug/data-types.html

Parameters:

df (DataFrame) – Pandas DataFrame.
index (bool) – Should consider the DataFrame index as a column?.
partition_cols (list[str] | None) – List of partitions names.
dtype (dict[str, str] | None) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘bigint’, ‘col2 name’: ‘int’})
file_format (str) – File format to be considered to place the index column: “parquet” | “csv”.

Return type:

tuple[dict[str, str], dict[str, str]]

Returns:

columns_types: Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}). / partitions_types: Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).

Examples

>>> import awswrangler as wr
>>> columns_types, partitions_types = wr.catalog.extract_athena_types(
...     df=df, index=False, partition_cols=["par0", "par1"], file_format="csv"
... )