41 - Apache Spark on Amazon Athena¶

Amazon Athena makes it easy to interactively run data analytics and exploration using Apache Spark without the need to plan for, configure, or manage resources. Running Apache Spark applications on Athena means submitting Spark code for processing and receiving the results directly without the need for additional configuration.

Run a Spark calculation¶

For this tutorial, you will need Spark-enabled Athena Workgroup. For the steps to create one, visit Getting started with Apache Spark on Amazon Athena.

[ ]:

import awswrangler as wr

workgroup: str = "my-spark-workgroup"

result = wr.athena.run_spark_calculation(
    code="print(spark)",
    workgroup=workgroup,
)

Create and re-use a session¶

It is possible to create a session and re-use it launching multiple calculations with the same resources. To create a session, use:

[ ]:

session_id: str = wr.athena.create_spark_session(
    workgroup=workgroup,
)

Now, to use the session, pass session_id:

[ ]:

result = wr.athena.run_spark_calculation(
    code="print(spark)",
    workgroup=workgroup,
    session_id=session_id,
)