awswrangler.athena.create_spark_session¶

awswrangler.athena.create_spark_session(workgroup: str, coordinator_dpu_size: int = 1, max_concurrent_dpus: int = 5, default_executor_dpu_size: int = 1, additional_configs: dict[str, Any] | None = None, spark_properties: dict[str, Any] | None = None, notebook_version: str | None = None, idle_timeout: int = 15, boto3_session: Session | None = None) → str¶

Create session and wait until ready to accept calculations.

Parameters:

workgroup (str) – Athena workgroup name. Must be Spark-enabled.
coordinator_dpu_size (int) – The number of DPUs to use for the coordinator. A coordinator is a special executor that orchestrates processing work and manages other executors in a notebook session. The default is 1.
max_concurrent_dpus (int) – The maximum number of DPUs that can run concurrently. The default is 5.
default_executor_dpu_size (int) – The default number of DPUs to use for executors. The default is 1.
additional_configs (dict[str, Any] | None) – Contains additional engine parameter mappings in the form of key-value pairs.
spark_properties (dict[str, Any] | None) – Contains SparkProperties in the form of key-value pairs.Specifies custom jar files and Spark properties for use cases like cluster encryption, table formats, and general Spark tuning.
notebook_version (str | None) – The notebook version. This value is supplied automatically for notebook sessions in the Athena console and is not required for programmatic session access. The only valid notebook version is Athena notebook version 1. If you specify a value for NotebookVersion, you must also specify a value for NotebookId
idle_timeout (int) – The idle timeout in minutes for the session. The default is 15.
boto3_session (Session | None) – The default boto3 session will be used if boto3_session receive None.

Return type:

str

Returns:

Session ID

Examples

>>> import awswrangler as wr
>>> df = wr.athena.create_spark_session(workgroup="...", max_concurrent_dpus=10)