awswrangler.catalog.extract_athena_types¶
- awswrangler.catalog.extract_athena_types(df: DataFrame, index: bool = False, partition_cols: list[str] | None = None, dtype: dict[str, str] | None = None, file_format: str = 'parquet') tuple[dict[str, str], dict[str, str]] ¶
Extract columns and partitions types (Amazon Athena) from Pandas DataFrame.
https://docs.aws.amazon.com/athena/latest/ug/data-types.html
- Parameters:
df (
DataFrame
) – Pandas DataFrame.index (
bool
) – Should consider the DataFrame index as a column?.partition_cols (
list
[str
] |None
) – List of partitions names.dtype (
dict
[str
,str
] |None
) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘bigint’, ‘col2 name’: ‘int’})file_format (
str
) – File format to be considered to place the index column: “parquet” | “csv”.
- Return type:
tuple
[dict
[str
,str
],dict
[str
,str
]]- Returns:
columns_types: Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}). / partitions_types: Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).
Examples
>>> import awswrangler as wr >>> columns_types, partitions_types = wr.catalog.extract_athena_types( ... df=df, index=False, partition_cols=["par0", "par1"], file_format="csv" ... )