awswrangler.timestream.batch_load

awswrangler.timestream.batch_load(df: DataFrame, path: str, database: str, table: str, time_col: str, dimensions_cols: list[str], measure_cols: list[str], measure_name_col: str, report_s3_configuration: TimestreamBatchLoadReportS3Configuration, time_unit: Literal['MILLISECONDS', 'SECONDS', 'MICROSECONDS', 'NANOSECONDS'] = 'MILLISECONDS', record_version: int = 1, timestream_batch_load_wait_polling_delay: float = 2, keep_files: bool = False, use_threads: bool | int = True, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None) dict[str, Any]

Batch load a Pandas DataFrame into a Amazon Timestream table.

Note

The supplied column names (time, dimension, measure) MUST match those in the Timestream table.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

  • boto3_session

  • s3_additional_kwargs

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • database

  • timestream_batch_load_wait_polling_delay

Check out the Global Configurations Tutorial for details.

Parameters:
  • df (pandas.DataFrame) – Pandas DataFrame.

  • path (str) – S3 prefix to write the data.

  • database (str) – Amazon Timestream database name.

  • table (str) – Amazon Timestream table name.

  • time_col (str) – Column name with the time data. It must be a long data type that represents the time since the Unix epoch.

  • dimensions_cols (List[str]) – List of column names with the dimensions data.

  • measure_cols (List[str]) – List of column names with the measure data.

  • measure_name_col (str) – Column name with the measure name.

  • report_s3_configuration (TimestreamBatchLoadReportS3Configuration) – Dictionary of the configuration for the S3 bucket where the error report is stored. https://docs.aws.amazon.com/timestream/latest/developerguide/API_ReportS3Configuration.html Example: {“BucketName”: ‘error-report-bucket-name’}

  • time_unit (str, optional) – Time unit for the time column. MILLISECONDS by default.

  • record_version (int, optional) – Record version.

  • timestream_batch_load_wait_polling_delay (float, optional) – Time to wait between two polling attempts.

  • keep_files (bool, optional) – Whether to keep the files after the operation.

  • use_threads (Union[bool, int], optional) – True to enable concurrent requests, False to disable multiple threads.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session is used if None.

  • s3_additional_kwargs (dict[str, str], optional) – Forwarded to S3 botocore requests.

Returns:

A dictionary of the batch load task response.

Return type:

Dict[str, Any]

Examples

>>> import awswrangler as wr
>>> response = wr.timestream.batch_load(
>>>     df=df,
>>>     path='s3://bucket/path/',
>>>     database='sample_db',
>>>     table='sample_table',
>>>     time_col='time',
>>>     dimensions_cols=['region', 'location'],
>>>     measure_cols=['memory_utilization', 'cpu_utilization'],
>>>     report_s3_configuration={'BucketName': 'error-report-bucket-name'},
>>> )