awswrangler.timestream.unload¶

awswrangler.timestream.unload(sql: str, path: str, unload_format: Literal['CSV', 'PARQUET'] | None = None, compression: Literal['GZIP', 'NONE'] | None = None, partition_cols: list[str] | None = None, encryption: Literal['SSE_KMS', 'SSE_S3'] | None = None, kms_key_id: str | None = None, field_delimiter: str | None = ',', escaped_by: str | None = '\\\\', chunked: bool | int = False, keep_files: bool = False, use_threads: bool | int = True, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) → DataFrame | Iterator[DataFrame]¶

Unload query results to Amazon S3 and read the results as Pandas Data Frame.

https://docs.aws.amazon.com/timestream/latest/developerguide/export-unload.html

Parameters:

sql (str) – SQL query
path (str) – S3 path to write stage files (e.g. s3://bucket_name/any_name/)
unload_format (Literal['CSV', 'PARQUET'] | None) – Format of the unloaded S3 objects from the query. Valid values: “CSV”, “PARQUET”. Case sensitive. Defaults to “PARQUET”
compression (Literal['GZIP', 'NONE'] | None) – Compression of the unloaded S3 objects from the query. Valid values: “GZIP”, “NONE”. Defaults to “GZIP”
partition_cols (list[str] | None) – Specifies the partition keys for the unload operation
encryption (Literal['SSE_KMS', 'SSE_S3'] | None) – Encryption of the unloaded S3 objects from the query. Valid values: “SSE_KMS”, “SSE_S3”. Defaults to “SSE_S3”
kms_key_id (str | None) – Specifies the key ID for an AWS Key Management Service (AWS KMS) key to be used to encrypt data files on Amazon S3
field_delimiter (str | None) – A single ASCII character that is used to separate fields in the output file, such as pipe character (|), a comma (,), or tab (/t). Only used with CSV format
escaped_by (str | None) – The character that should be treated as an escape character in the data file written to S3 bucket. Only used with CSV format
chunked (bool | int) – If passed will split the data in a Iterable of DataFrames (Memory friendly). If True awswrangler iterates on the data by files in the most efficient way without guarantee of chunksize. If an INTEGER is passed awswrangler will iterate on the data by number of rows equal the received INTEGER.
keep_files (bool) – Should keep stage files?
use_threads (bool | int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.
boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.
s3_additional_kwargs (dict[str, str] | None) – Forward to botocore requests.
pyarrow_additional_kwargs (dict[str, Any] | None) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.

Return type:

DataFrame | Iterator[DataFrame]

Returns:

Result as Pandas DataFrame(s).

Examples

Unload and read as Parquet (default).

>>> import awswrangler as wr
>>> df = wr.timestream.unload(
...     sql="SELECT time, measure, dimension FROM database.mytable",
...     path="s3://bucket/extracted_parquet_files/",
... )

Unload and read partitioned Parquet. Note: partition columns must be at the end of the table.

>>> import awswrangler as wr
>>> df = wr.timestream.unload(
...     sql="SELECT time, measure, dim1, dim2 FROM database.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     partition_cols=["dim2"],
... )

Unload and read as CSV.

>>> import awswrangler as wr
>>> df = wr.timestream.unload(
...     sql="SELECT time, measure, dimension FROM database.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     unload_format="CSV",
... )