awswrangler.timestream.unload

awswrangler.timestream.unload(sql: str, path: str, unload_format: Literal['CSV', 'PARQUET'] | None = None, compression: Literal['GZIP', 'NONE'] | None = None, partition_cols: list[str] | None = None, encryption: Literal['SSE_KMS', 'SSE_S3'] | None = None, kms_key_id: str | None = None, field_delimiter: str | None = ',', escaped_by: str | None = '\\', chunked: bool | int = False, keep_files: bool = False, use_threads: bool | int = True, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) DataFrame | Iterator[DataFrame]

Unload query results to Amazon S3 and read the results as Pandas Data Frame.

https://docs.aws.amazon.com/timestream/latest/developerguide/export-unload.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

Check out the Global Configurations Tutorial for details.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

  • boto3_session

  • s3_additional_kwargs

Parameters:
  • sql (str) – SQL query

  • path (str) – S3 path to write stage files (e.g. s3://bucket_name/any_name/)

  • unload_format (str, optional) – Format of the unloaded S3 objects from the query. Valid values: “CSV”, “PARQUET”. Case sensitive. Defaults to “PARQUET”

  • compression (str, optional) – Compression of the unloaded S3 objects from the query. Valid values: “GZIP”, “NONE”. Defaults to “GZIP”

  • partition_cols (List[str], optional) – Specifies the partition keys for the unload operation

  • encryption (str, optional) – Encryption of the unloaded S3 objects from the query. Valid values: “SSE_KMS”, “SSE_S3”. Defaults to “SSE_S3”

  • kms_key_id (str, optional) – Specifies the key ID for an AWS Key Management Service (AWS KMS) key to be used to encrypt data files on Amazon S3

  • field_delimiter (str, optional) – A single ASCII character that is used to separate fields in the output file, such as pipe character (|), a comma (,), or tab (/t). Only used with CSV format

  • escaped_by (str, optional) – The character that should be treated as an escape character in the data file written to S3 bucket. Only used with CSV format

  • chunked (Union[int, bool]) – If passed will split the data in a Iterable of DataFrames (Memory friendly). If True awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize. If an INTEGER is passed awswrangler will iterate on the data by number of rows equal the received INTEGER.

  • keep_files (bool) – Should keep stage files?

  • use_threads (bool, int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session is used if None

  • s3_additional_kwargs (Dict[str, str], optional) – Forward to botocore requests.

  • pyarrow_additional_kwargs (Dict[str, Any], optional) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.

Returns:

Result as Pandas DataFrame(s).

Return type:

Union[pandas.DataFrame, Iterator[pandas.DataFrame]]

Examples

Unload and read as Parquet (default).

>>> import awswrangler as wr
>>> df = wr.timestream.unload(
...     sql="SELECT time, measure, dimension FROM database.mytable",
...     path="s3://bucket/extracted_parquet_files/",
... )

Unload and read partitioned Parquet. Note: partition columns must be at the end of the table.

>>> import awswrangler as wr
>>> df = wr.timestream.unload(
...     sql="SELECT time, measure, dim1, dim2 FROM database.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     partition_cols=["dim2"],
... )

Unload and read as CSV.

>>> import awswrangler as wr
>>> df = wr.timestream.unload(
...     sql="SELECT time, measure, dimension FROM database.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     unload_format="CSV",
... )