5. Move dependencies to optional¶
Date: 2023-03-15
Status¶
Accepted
Context¶
AWS SDK for pandas relies on external dependencies in some of its modules. These include redshift-connector
, gremlinpython
and pymysql
to cite a few.
In versions 2.x and below, most of these packages were set as required, meaning they were installed regardless of whether the user actually needed them. This has introduced two major risks and issues as the number of dependencies increased:
Security risk: Unused dependencies increase the attack surface to manage. Users must scan them and ensure that they are kept up to date even though they don’t need them
Dependency hell: Users must resolve dependencies for packages that they are not using. It can lead to dependency hell and prevent critical updates related to security patches and major bugs
Decision¶
A breaking change is introduced in version 3.x where the number of required dependencies is reduced to the most important ones, namely:
boto3
pandas
numpy
pyarrow
typing-extensions
Consequences¶
All other dependencies are moved to optional and must be installed by the user separately using pip install awswrangler[dependency]
. For instance, the command to use the redshift APIs is pip install awswrangler[redshift]
. Failing to do so raises an exception informing the user that the package is missing and how to install it