awswrangler.emr_serverless.run_job

awswrangler.emr_serverless.run_job(application_id: str, execution_role_arn: str, job_driver_args: dict[str, Any] | SparkSubmitJobArgs | HiveRunJobArgs, job_type: Literal['Spark', 'Hive'] = 'Spark', wait: bool = True, configuration_overrides: dict[str, Any] | None = None, tags: dict[str, str] | None = None, execution_timeout: int | None = None, name: str | None = None, emr_serverless_job_wait_polling_delay: float = 5, boto3_session: Session | None = None) str | dict[str, Any]

Run an EMR serverless job.

https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • emr_serverless_job_wait_polling_delay

Check out the Global Configurations Tutorial for details.

Warning

This API is experimental and may change in future AWS SDK for Pandas releases.

Parameters:
  • application_id (str) – The id of the application on which to run the job.

  • execution_role_arn (str) – The execution role ARN for the job run.

  • job_driver_args (Union[Dict[str, str], SparkSubmitJobArgs, HiveRunJobArgs]) – The job driver arguments for the job run.

  • job_type (str, optional) – Type of the job: “Spark” or “Hive”. Defaults to “Spark”.

  • wait (bool, optional) – Whether to wait for the job completion or not. Defaults to true.

  • configuration_overrides (Dict[str, str], optional) – The configuration overrides for the job run.

  • tags (Dict[str, str], optional) – Key/Value collection to put tags on the application. e.g. {“foo”: “boo”, “bar”: “xoo”})

  • execution_timeout (int, optional) – The maximum duration for the job run to run. If the job run runs beyond this duration, it will be automatically cancelled.

  • name (str, optional) – Name of the job.

  • emr_serverless_job_wait_polling_delay (int, optional) – Time to wait between polling attempts.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Returns:

Job Id if wait=False, or job run details.

Return type:

Union[str, Dict[str, Any]]