awswrangler.catalog.extract_athena_types¶
- awswrangler.catalog.extract_athena_types(df: DataFrame, index: bool = False, partition_cols: Optional[List[str]] = None, dtype: Optional[Dict[str, str]] = None, file_format: str = 'parquet') Tuple[Dict[str, str], Dict[str, str]] ¶
Extract columns and partitions types (Amazon Athena) from Pandas DataFrame.
https://docs.aws.amazon.com/athena/latest/ug/data-types.html
- Parameters
df (pandas.DataFrame) – Pandas DataFrame.
index (bool) – Should consider the DataFrame index as a column?.
partition_cols (List[str], optional) – List of partitions names.
dtype (Dict[str, str], optional) – Dictionary of columns names and Athena/Glue types to be casted. Useful when you have columns with undetermined or mixed data types. (e.g. {‘col name’: ‘bigint’, ‘col2 name’: ‘int’})
file_format (str, optional) – File format to be considered to place the index column: “parquet” | “csv”.
- Returns
columns_types: Dictionary with keys as column names and values as data types (e.g. {‘col0’: ‘bigint’, ‘col1’: ‘double’}). / partitions_types: Dictionary with keys as partition names and values as data types (e.g. {‘col2’: ‘date’}).
- Return type
Tuple[Dict[str, str], Dict[str, str]]
Examples
>>> import awswrangler as wr >>> columns_types, partitions_types = wr.catalog.extract_athena_types( ... df=df, index=False, partition_cols=["par0", "par1"], file_format="csv" ... )