awswrangler.emr_serverless.run_job

awswrangler.emr_serverless.run_job(application_id: str, execution_role_arn: str, job_driver_args: dict[str, Any] | SparkSubmitJobArgs | HiveRunJobArgs, job_type: Literal['Spark', 'Hive'] = 'Spark', wait: bool = True, configuration_overrides: dict[str, Any] | None = None, tags: dict[str, str] | None = None, execution_timeout: int | None = None, name: str | None = None, emr_serverless_job_wait_polling_delay: float = 5, boto3_session: Session | None = None) str | dict[str, Any]

Run an EMR serverless job.

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • emr_serverless_job_wait_polling_delay

Check out the Global Configurations Tutorial for details.

Warning

This API is experimental and may change in future AWS SDK for Pandas releases.

Parameters:
  • application_id (str) – The id of the application on which to run the job.

  • execution_role_arn (str) – The execution role ARN for the job run.

  • job_driver_args (dict[str, Any] | SparkSubmitJobArgs | HiveRunJobArgs) – The job driver arguments for the job run.

  • job_type (Literal['Spark', 'Hive']) – Type of the job: “Spark” or “Hive”. Defaults to “Spark”.

  • wait (bool) – Whether to wait for the job completion or not. Defaults to true.

  • configuration_overrides (dict[str, Any] | None) – The configuration overrides for the job run.

  • tags (dict[str, str] | None) – Key/Value collection to put tags on the application. e.g. {“foo”: “boo”, “bar”: “xoo”})

  • execution_timeout (int | None) – The maximum duration for the job run to run. If the job run runs beyond this duration, it will be automatically cancelled.

  • name (str | None) – Name of the job.

  • emr_serverless_job_wait_polling_delay (float) – Time to wait between polling attempts.

  • boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.

Return type:

str | dict[str, Any]

Returns:

Job Id if wait=False, or job run details.