awswrangler.athena.create_spark_session¶

awswrangler.athena.create_spark_session(workgroup: str, coordinator_dpu_size: int = 1, max_concurrent_dpus: int = 5, default_executor_dpu_size: int = 1, additional_configs: dict[str, Any] | None = None, spark_properties: dict[str, Any] | None = None, idle_timeout: int = 15, boto3_session: Session | None = None) → str¶

Create session and wait until ready to accept calculations.

Parameters:

workgroup (str) – Athena workgroup name. Must be Spark-enabled.
coordinator_dpu_size (int, optional) – The number of DPUs to use for the coordinator. A coordinator is a special executor that orchestrates processing work and manages other executors in a notebook session. The default is 1.
max_concurrent_dpus (int, optional) – The maximum number of DPUs that can run concurrently. The default is 5.
default_executor_dpu_size (int, optional) – The default number of DPUs to use for executors. The default is 1.
additional_configs (Dict[str, Any], optional) – Contains additional engine parameter mappings in the form of key-value pairs.
spark_properties (Dict[str, Any], optional) – Contains SparkProperties in the form of key-value pairs.Specifies custom jar files and Spark properties for use cases like cluster encryption, table formats, and general Spark tuning.
idle_timeout (int, optional) – The idle timeout in minutes for the session. The default is 15.
boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

Returns:

Session id

Return type:

str

Examples

>>> import awswrangler as wr
>>> df = wr.athena.create_spark_session(workgroup="...", max_concurrent_dpus=10)