An AWS Professional Service open source initiative | aws-proserve-opensource@amazon.com
Quick Start¶
>>> pip install awswrangler
import awswrangler as wr
import pandas as pd
from datetime import datetime
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
# Storing data on Data Lake
wr.s3.to_parquet(
df=df,
path="s3://bucket/dataset/",
dataset=True,
database="my_db",
table="my_table"
)
# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)
# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")
# Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrum
con = wr.redshift.connect("my-glue-connection")
df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con)
con.close()
# Amazon Timestream Write
df = pd.DataFrame({
"time": [datetime.now(), datetime.now()],
"my_dimension": ["foo", "boo"],
"measure": [1.0, 1.1],
})
rejected_records = wr.timestream.write(df,
database="sampleDB",
table="sampleTable",
time_col="time",
measure_col="measure",
dimensions_cols=["my_dimension"],
)
# Amazon Timestream Query
wr.timestream.query("""
SELECT time, measure_value::double, my_dimension
FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
""")
Read The Docs¶
- What is AWS Data Wrangler?
- Install
- Tutorials
- 1 - Introduction
- 2 - Sessions
- 3 - Amazon S3
- 4 - Parquet Datasets
- 5 - Glue Catalog
- 6 - Amazon Athena
- 7 - Redshift, MySQL, PostgreSQL, SQL Server and Oracle
- 8 - Redshift - COPY & UNLOAD
- 9 - Redshift - Append, Overwrite and Upsert
- 10 - Parquet Crawler
- 11 - CSV Datasets
- 12 - CSV Crawler
- 13 - Merging Datasets on S3
- 14 - Schema Evolution
- 15 - EMR
- 16 - EMR & Docker
- 17 - Partition Projection
- 18 - QuickSight
- 19 - Amazon Athena Cache
- 20 - Spark Table Interoperability
- 21 - Global Configurations
- 22 - Writing Partitions Concurrently
- 23 - Flexible Partitions Filter (PUSH-DOWN)
- 24 - Athena Query Metadata
- 25 - Redshift - Loading Parquet files with Spectrum
- 26 - Amazon Timestream
- 27 - Amazon Timestream - Example 2
- 28 - Amazon DynamoDB
- 29 - S3 Select
- 30 - Data Api
- 31 - OpenSearch
- 32 - AWS Lake Formation - Glue Governed tables
- 33 - Amazon Neptune
- API Reference
- Community Resources
- Logging
- Who uses AWS Data Wrangler?
- License
- Contributing
- Legacy Docs (pre-1.0.0)
