awswrangler.redshift.unload_to_files

awswrangler.redshift.unload_to_files(sql: str, path: str, con: redshift_connector.Connection, iam_role: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, aws_session_token: str | None = None, region: str | None = None, unload_format: Literal['CSV', 'PARQUET'] | None = None, parallel: bool = True, max_file_size: float | None = None, kms_key_id: str | None = None, manifest: bool = False, partition_cols: list[str] | None = None, boto3_session: boto3.Session | None = None) None

Unload Parquet files on s3 from a Redshift query result (Through the UNLOAD command).

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Note

In case of use_threads=True the number of threads that will be spawned will be gotten from os.cpu_count().

Parameters:
  • sql (str) – SQL query.

  • path (Union[str, List[str]]) – S3 path to write stage files (e.g. s3://bucket_name/any_name/)

  • con (redshift_connector.Connection) – Use redshift_connector.connect() to use ” “credentials directly or wr.redshift.connect() to fetch it from the Glue Catalog.

  • iam_role (str, optional) – AWS IAM role with the related permissions.

  • aws_access_key_id (str, optional) – The access key for your AWS account.

  • aws_secret_access_key (str, optional) – The secret key for your AWS account.

  • aws_session_token (str, optional) – The session key for your AWS account. This is only needed when you are using temporary credentials.

  • region (str, optional) – Specifies the AWS Region where the target Amazon S3 bucket is located. REGION is required for UNLOAD to an Amazon S3 bucket that isn’t in the same AWS Region as the Amazon Redshift cluster. By default, UNLOAD assumes that the target Amazon S3 bucket is located in the same AWS Region as the Amazon Redshift cluster.

  • unload_format (str, optional) – Format of the unloaded S3 objects from the query. Valid values: “CSV”, “PARQUET”. Case sensitive. Defaults to PARQUET.

  • parallel (bool) – Whether to unload to multiple files in parallel. Defaults to True. By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. If parallel is False, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used.

  • max_file_size (float, optional) – Specifies the maximum size (MB) of files that UNLOAD creates in Amazon S3. Specify a decimal value between 5.0 MB and 6200.0 MB. If None, the default maximum file size is 6200.0 MB.

  • kms_key_id (str, optional) – Specifies the key ID for an AWS Key Management Service (AWS KMS) key to be used to encrypt data files on Amazon S3.

  • manifest (bool) – Unload a manifest file on S3.

  • partition_cols (List[str], optional) – Specifies the partition keys for the unload operation.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Return type:

None

Examples

>>> import awswrangler as wr
>>> con = wr.redshift.connect("MY_GLUE_CONNECTION")
>>> wr.redshift.unload_to_files(
...     sql="SELECT * FROM public.mytable",
...     path="s3://bucket/extracted_parquet_files/",
...     con=con,
...     iam_role="arn:aws:iam::XXX:role/XXX"
... )
>>> con.close()