AWS Data Wrangler runs on Python 3.7, 3.8, 3.9 and 3.10, and on several platforms (AWS Lambda, AWS Glue Python Shell, EMR, EC2, on-premises, Amazon SageMaker, local, etc).

Some good practices to follow for options below are:

  • Use new and isolated Virtual Environments for each project (venv).

  • On Notebooks, always restart your kernel after installations.


If you want to use awswrangler to connect to Microsoft SQL Server, some additional configuration is needed. Please have a look at the corresponding section below.

PyPI (pip)

>>> pip install awswrangler


>>> conda install -c conda-forge awswrangler

AWS Lambda Layer

Managed Layer


There is a one week minimum delay between version release and layers being available in the AWS Lambda console.

AWS Data Wrangler is available as an AWS Lambda Managed layer in all AWS commercial regions.

It can be accessed in the AWS Lambda console directly:

AWS Managed Lambda Layer

Or via its ARN: arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python<python-version>:<layer-version>.

For example: arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python37:1.

The full list of ARNs is available here.

Custom Layer

You can also create your own Lambda layer with these instructions:

1 - Go to GitHub’s release section and download the zipped layer for to the desired version. Alternatively, you can download the zip from the public artifacts bucket.

2 - Go to the AWS Lambda console, open the layer section (left side) and click create layer.

3 - Set name and python version, upload your downloaded zip file and press create.

4 - Go to your Lambda function and select your new layer!

Serverless Application Repository (SAR)

Starting version 2.12.0, AWS Data Wrangler layers are also available in the AWS Serverless Application Repository (SAR).

The app deploys the Lambda layer version in your own AWS account and region via a CloudFormation stack. This option provides the ability to use semantic versions (i.e. library version) instead of Lambda layer versions.

AWS Data Wrangler Layer Apps






Layer for Python 3.7.x runtimes



Layer for Python 3.8.x runtimes



Layer for Python 3.9.x runtimes

Here is an example of how to create and use the AWS Data Wrangler Lambda layer in your CDK app:

from aws_cdk import core, aws_sam as sam, aws_lambda

class DataWranglerApp(core.Construct):
  def __init__(self, scope: core.Construct, id_: str):

    wrangler_layer = sam.CfnApplication(
        semantic_version="2.15.1",  # Get the latest version from

    wrangler_layer_arn = wrangler_layer.get_att("Outputs.WranglerLayer38Arn").to_string()
    wrangler_layer_version = aws_lambda.LayerVersion.from_layer_version_arn(self, "wrangler-layer-version", wrangler_layer_arn)


AWS Glue Python Shell Jobs


Glue Python Shell runs on Python3.6, for which support was dropped in version 2.15.0 of Wrangler. Please use version 2.14.0 of the library or below.

1 - Go to GitHub’s release page and download the wheel file (.whl) related to the desired version. Alternatively, you can download the wheel from the public artifacts bucket.

2 - Upload the wheel file to the Amazon S3 location of your choice.

3 - Go to your Glue Python Shell job and point to the S3 wheel file in the Python library path field.

Official Glue Python Shell Reference

AWS Glue PySpark Jobs


AWS Data Wrangler has compiled dependencies (C/C++) so support is only available for Glue PySpark Jobs >= 2.0.

Go to your Glue PySpark job and create a new Job parameters key/value:

  • Key: --additional-python-modules

  • Value: pyarrow==2,awswrangler

To install a specific version, set the value for the above Job parameter as follows:

  • Value: cython==0.29.21,pg8000==1.21.0,pyarrow==2,pandas==1.3.0,awswrangler==2.15.1


Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why an installation of pyarrow 2 is required.

Official Glue PySpark Reference

Public Artifacts

Lambda zipped layers and Python wheels are stored in a publicly accessible S3 bucket for all versions.

  • Bucket: aws-data-wrangler-public-artifacts

  • Prefix: releases/<version>/

    • Lambda layer: awswrangler-layer-<version>-py<py-version>.zip

    • Python wheel: awswrangler-<version>-py3-none-any.whl

For example: s3://aws-data-wrangler-public-artifacts/releases/2.15.1/

Amazon SageMaker Notebook

Run this command in any Python 3 notebook cell and then make sure to restart the kernel before importing the awswrangler package.

>>> !pip install awswrangler

Amazon SageMaker Notebook Lifecycle

Open the AWS SageMaker console, go to the lifecycle section and use the below snippet to configure AWS Data Wrangler for all compatible SageMaker kernels (Reference).


set -e

# This script installs a single pip package in all SageMaker conda environments, apart from the JupyterSystemEnv which
# is a system environment reserved for Jupyter.
# Note this may timeout if the package installations in all environments take longer than 5 mins, consider using
# "nohup" to run this as a background process in that case.

sudo -u ec2-user -i <<'EOF'


# Note that "base" is special environment name, include it there as well.
for env in base /home/ec2-user/anaconda3/envs/*; do
    source /home/ec2-user/anaconda3/bin/activate $(basename "$env")
    if [ $env = 'JupyterSystemEnv' ]; then
    nohup pip install --upgrade "$PACKAGE" &
    source /home/ec2-user/anaconda3/bin/deactivate

EMR Cluster

Despite not being a distributed library, AWS Data Wrangler could be used to complement Big Data pipelines.

  • Configure Python 3 as the default interpreter for PySpark on your cluster configuration [ONLY REQUIRED FOR EMR < 6]

         "Classification": "spark-env",
         "Configurations": [
             "Classification": "export",
             "Properties": {
                "PYSPARK_PYTHON": "/usr/bin/python3"
  • Keep the bootstrap script above on S3 and reference it on your cluster.

    • For EMR Release < 6

      #!/usr/bin/env bash
      set -ex
      sudo pip-3.6 install pyarrow==2 awswrangler
    • For EMR Release >= 6

      #!/usr/bin/env bash
      set -ex
      sudo pip install pyarrow==2 awswrangler


Make sure to freeze the library version in the bootstrap for production environments (e.g. awswrangler==2.15.1)


Pyarrow 3 is not currently supported in the default EMR image, which is why an installation of pyarrow 2 is required.

From Source

>>> git clone
>>> cd aws-data-wrangler
>>> pip install .

Notes for Microsoft SQL Server

awswrangler uses pyodbc for interacting with Microsoft SQL Server. To install this package you need the ODBC header files, which can be installed, with the following commands:

>>> sudo apt install unixodbc-dev
>>> yum install unixODBC-devel

After installing these header files you can either just install pyodbc or awswrangler with the sqlserver extra, which will also install pyodbc:

>>> pip install pyodbc
>>> pip install awswrangler[sqlserver]

Finally you also need the correct ODBC Driver for SQL Server. You can have a look at the documentation from Microsoft to see how they can be installed in your environment.

If you want to connect to Microsoft SQL Server from AWS Lambda, you can build a separate Layer including the needed OBDC drivers and pyobdc.

If you maintain your own environment, you need to take care of the above steps. Because of this limitation usage in combination with Glue jobs is limited and you need to rely on the provided functionality inside Glue itself.

Notes for SPARQL support

To be able to use SPARQL either just install SPARQLWrapper or awswrangler with the sparql extra, which will also install SPARQLWrapper:

>>> pip install SPARQLWrapper
>>> pip install awswrangler[sparql]