awswrangler.athena.run_spark_calculation¶

awswrangler.athena.run_spark_calculation(code: str, workgroup: str, session_id: str | None = None, coordinator_dpu_size: int = 1, max_concurrent_dpus: int = 5, default_executor_dpu_size: int = 1, additional_configs: dict[str, Any] | None = None, spark_properties: dict[str, Any] | None = None, notebook_version: str | None = None, idle_timeout: int = 15, boto3_session: Session | None = None) → dict[str, Any]¶

Execute Spark Calculation and wait for completion.

Parameters:

code (str) – A string that contains the code for the calculation.
workgroup (str) – Athena workgroup name. Must be Spark-enabled.
session_id (str | None) – The session id. If not passed, a session will be started.
coordinator_dpu_size (int) – The number of DPUs to use for the coordinator. A coordinator is a special executor that orchestrates processing work and manages other executors in a notebook session. The default is 1.
max_concurrent_dpus (int) – The maximum number of DPUs that can run concurrently. The default is 5.
default_executor_dpu_size (int) – The default number of DPUs to use for executors. The default is 1.
additional_configs (dict[str, Any] | None) – Contains additional engine parameter mappings in the form of key-value pairs.
spark_properties (dict[str, Any] | None) – Contains SparkProperties in the form of key-value pairs.Specifies custom jar files and Spark properties for use cases like cluster encryption, table formats, and general Spark tuning.
notebook_version (str | None) – The notebook version. This value is supplied automatically for notebook sessions in the Athena console and is not required for programmatic session access. The only valid notebook version is Athena notebook version 1. If you specify a value for NotebookVersion, you must also specify a value for NotebookId
idle_timeout (int) – The idle timeout in minutes for the session. The default is 15.
boto3_session (Session | None) – The default boto3 session will be used if boto3_session receive None.

Return type:

dict[str, Any]

Returns:

Calculation response

Examples

>>> import awswrangler as wr
>>> df = wr.athena.run_spark_calculation(
...     code="print(spark)",
...     workgroup="...",
... )