awswrangler.emr.submit_spark_step

awswrangler.emr.submit_spark_step(cluster_id: str, path: str, args: list[str] | None = None, deploy_mode: Literal['cluster', 'client'] = 'cluster', docker_image: str | None = None, name: str = 'my-step', action_on_failure: Literal['TERMINATE_JOB_FLOW', 'TERMINATE_CLUSTER', 'CANCEL_AND_WAIT', 'CONTINUE'] = 'CONTINUE', region: str | None = None, boto3_session: Session | None = None) str

Submit Spark Step.

Parameters:
  • cluster_id (str) – Cluster ID.

  • path (str) – Script path. (e.g. s3://bucket/app.py)

  • args (List[str], optional) – CLI args to use with script eg. args = [”–name”, “hello-world”]

  • deploy_mode (str) – “cluster” | “client”

  • docker_image (str, optional) – e.g. “{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{IMAGE_NAME}:{TAG}”

  • name (str, optional) – Step name.

  • action_on_failure (str) – ‘TERMINATE_JOB_FLOW’, ‘TERMINATE_CLUSTER’, ‘CANCEL_AND_WAIT’, ‘CONTINUE’

  • region (str, optional) – Region name to not get it from boto3.Session. (e.g. us-east-1)

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Returns:

Step ID.

Return type:

str

Examples

>>> import awswrangler as wr
>>> step_id = wr.emr.submit_spark_step(
>>>     cluster_id="cluster-id",
>>>     path="s3://bucket/emr/app.py"
>>> )