AWS SDK for pandas

41 - Apache Spark on Amazon Athena

Amazon Athena makes it easy to interactively run data analytics and exploration using Apache Spark without the need to plan for, configure, or manage resources. Running Apache Spark applications on Athena means submitting Spark code for processing and receiving the results directly without the need for additional configuration.

More in User Guide.

Run a Spark calculation

For this tutorial, you will need Spark-enabled Athena Workgroup. For the steps to create one, visit Getting started with Apache Spark on Amazon Athena.

[ ]:
import awswrangler as wr

workgroup: str = "my-spark-workgroup"

result = wr.athena.run_spark_calculation(
    code="print(spark)",
    workgroup=workgroup,
)

Create and re-use a session

It is possible to create a session and re-use it launching multiple calculations with the same resources. To create a session, use:

[ ]:
session_id: str = wr.athena.create_spark_session(
    workgroup=workgroup,
)

Now, to use the session, pass session_id:

[ ]:
result = wr.athena.run_spark_calculation(
    code="print(spark)",
    workgroup=workgroup,
    session_id=session_id,
)