API Reference¶
Amazon S3¶
|
Copy a list of S3 objects to another S3 directory. |
|
Delete Amazon S3 objects from a received S3 prefix or list of S3 objects paths. |
|
Describe Amazon S3 objects from a received S3 prefix or list of S3 objects paths. |
|
Check if object exists on S3. |
|
Download file from a received S3 path to local file. |
|
Get bucket region name. |
|
List Amazon S3 buckets. |
|
List Amazon S3 objects from a prefix. |
|
List Amazon S3 objects from a prefix. |
|
Merge a source dataset into a target dataset. |
|
Read CSV file(s) from a received S3 prefix or list of S3 objects paths. |
|
Read EXCEL file(s) from a received S3 path. |
|
Read fixed-width formatted file(s) from a received S3 prefix or list of S3 objects paths. |
|
Read JSON file(s) from a received S3 prefix or list of S3 objects paths. |
|
Read Parquet file(s) from an S3 prefix or list of S3 objects paths. |
|
Read Apache Parquet file(s) metadata from an S3 prefix or list of S3 objects paths. |
|
Read Apache Parquet table registered in the AWS Glue Catalog. |
|
Read ORC file(s) from an S3 prefix or list of S3 objects paths. |
|
Read Apache ORC file(s) metadata from an S3 prefix or list of S3 objects paths. |
|
Read Apache ORC table registered in the AWS Glue Catalog. |
|
Load a Deltalake table data from an S3 path. |
|
Filter contents of Amazon S3 objects based on SQL statement. |
|
Get the size (ContentLength) in bytes of Amazon S3 objects from a received S3 prefix or list of S3 objects paths. |
|
Infer and store parquet metadata on AWS Glue Catalog. |
|
Write CSV file or dataset on Amazon S3. |
|
Write EXCEL file on Amazon S3. |
|
Write JSON file on Amazon S3. |
|
Write Parquet file or dataset on Amazon S3. |
|
Write ORC file or dataset on Amazon S3. |
|
Write a DataFrame to S3 as a DeltaLake table. |
|
Upload file from a local file to received S3 path. |
|
Wait Amazon S3 objects exist. |
|
Wait Amazon S3 objects not exist. |
AWS Glue Catalog¶
|
Add a column in a AWS Glue Catalog table. |
|
Add partitions (metadata) to a CSV Table in the AWS Glue Catalog. |
|
Add partitions (metadata) to a Parquet Table in the AWS Glue Catalog. |
|
Create a CSV Table (Metadata Only) in the AWS Glue Catalog. |
|
Create a database in AWS Glue Catalog. |
|
Create a JSON Table (Metadata Only) in the AWS Glue Catalog. |
|
Create a Parquet Table (Metadata Only) in the AWS Glue Catalog. |
|
Get a Pandas DataFrame with all listed databases. |
|
Delete a column in a AWS Glue Catalog table. |
|
Delete a database in AWS Glue Catalog. |
|
Delete specified partitions in a AWS Glue Catalog table. |
|
Delete all partitions in a AWS Glue Catalog table. |
|
Delete Glue table if exists. |
|
Check if the table exists. |
Drop all repeated columns (duplicated names). |
|
|
Extract columns and partitions types (Amazon Athena) from Pandas DataFrame. |
|
Get all columns comments. |
|
Get all columns parameters. |
|
Get all partitions from a Table in the AWS Glue Catalog. |
|
Get an iterator of databases. |
|
Get all partitions from a Table in the AWS Glue Catalog. |
|
Get all partitions from a Table in the AWS Glue Catalog. |
|
Get table description. |
|
Get table's location on Glue catalog. |
|
Get total number of versions. |
|
Get all parameters. |
|
Get all columns and types from a table. |
|
Get all versions. |
|
Get an iterator of tables. |
|
Overwrite all existing parameters. |
|
Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. |
|
Normalize all columns names to be compatible with Amazon Athena. |
|
Convert the table name to be compatible with Amazon Athena and the AWS Glue Catalog. |
|
Get Pandas DataFrame of tables filtered by a search string. |
|
Get table details as Pandas DataFrame. |
|
Get a DataFrame with tables filtered by a search term, prefix, suffix. |
|
Insert or Update the received parameters. |
Amazon Athena¶
|
Create the default Athena bucket if it doesn't exist. |
|
Create session and wait until ready to accept calculations. |
|
Create a new table populated with the results of a SELECT query. |
|
Generate the query that created a table(EXTERNAL_TABLE) or a view(VIRTUAL_TABLE). |
|
Get the data type of all columns queried. |
|
Fetch query execution details. |
|
From specified query execution IDs, return a DataFrame of query execution details. |
|
Get AWS Athena SQL query results as a Pandas DataFrame. |
|
Get the named query statement string from a query ID. |
|
Return information about the workgroup with the specified name. |
|
Fetch list query execution IDs ran in specified workgroup or primary work group if not specified. |
|
Execute any SQL query on AWS Athena and return the results as a Pandas DataFrame. |
|
Extract the full table AWS Athena and return the results as a Pandas DataFrame. |
|
Run the Hive's metastore consistency check: 'MSCK REPAIR TABLE table;'. |
|
Execute Spark Calculation and wait for completion. |
|
Generate the query that created it: 'SHOW CREATE TABLE table;'. |
|
Start a SQL Query against AWS Athena. |
|
Stop a query execution. |
|
Insert into Athena Iceberg table using INSERT INTO . |
|
Delete rows from an Iceberg table. |
|
Write query results from a SELECT statement to the specified data format using UNLOAD. |
|
Wait for the query end. |
|
Create a SQL statement with the name statement_name to be run at a later time. |
|
List the prepared statements in the specified workgroup. |
|
Delete the prepared statement with the specified name from the specified workgroup. |
Amazon Redshift¶
|
Return a redshift_connector connection from a Glue Catalog or Secret Manager. |
|
Return a redshift_connector temporary connection (No password required). |
|
Load Pandas DataFrame as a Table on Amazon Redshift using parquet files on S3 as stage. |
|
Load files from S3 to a Table on Amazon Redshift (Through COPY command). |
|
Return a DataFrame corresponding to the result set of the query string. |
|
Return a DataFrame corresponding the table. |
|
Write records stored in a DataFrame into Redshift. |
|
Load Pandas DataFrame from a Amazon Redshift query result using Parquet files on s3 as stage. |
|
Unload Parquet files on s3 from a Redshift query result (Through the UNLOAD command). |
PostgreSQL¶
|
Return a pg8000 connection from a Glue Catalog Connection. |
|
Return a DataFrame corresponding to the result set of the query string. |
|
Return a DataFrame corresponding the table. |
|
Write records stored in a DataFrame into PostgreSQL. |
MySQL¶
|
Return a pymysql connection from a Glue Catalog Connection or Secrets Manager. |
|
Return a DataFrame corresponding to the result set of the query string. |
|
Return a DataFrame corresponding the table. |
|
Write records stored in a DataFrame into MySQL. |
Microsoft SQL Server¶
|
Return a pyodbc connection from a Glue Catalog Connection. |
|
Return a DataFrame corresponding to the result set of the query string. |
|
Return a DataFrame corresponding the table. |
|
Write records stored in a DataFrame into Microsoft SQL Server. |
Oracle¶
|
Return a oracledb connection from a Glue Catalog Connection. |
|
Return a DataFrame corresponding to the result set of the query string. |
|
Return a DataFrame corresponding the table. |
|
Write records stored in a DataFrame into Oracle Database. |
Data API Redshift¶
|
Provides access to a Redshift cluster via the Data API. |
|
Create a Redshift Data API connection. |
|
Run an SQL query on a RedshiftDataApi connection and return the result as a DataFrame. |
Data API RDS¶
|
Provides access to the RDS Data API. |
|
Create a RDS Data API connection. |
|
Run an SQL query on an RdsDataApi connection and return the result as a DataFrame. |
|
Insert data using an SQL query on a Data API connection. |
AWS Glue Data Quality¶
|
Create recommendation Data Quality ruleset. |
|
Create Data Quality ruleset. |
|
Evaluate Data Quality ruleset. |
|
Get a Data Quality ruleset. |
|
Update Data Quality ruleset. |
OpenSearch¶
|
Create a secure connection to the specified Amazon OpenSearch domain. |
|
Create Amazon OpenSearch Serverless collection. |
|
Create an index. |
|
Delete an index. |
|
Index all documents from a CSV file to OpenSearch index. |
|
Index all documents to OpenSearch index. |
|
Index all documents from a DataFrame to OpenSearch index. |
|
Index all documents from JSON file to OpenSearch index. |
|
Return results matching query DSL as pandas DataFrame. |
|
Return results matching SQL query as pandas DataFrame. |
Amazon Neptune¶
|
Create a connection to a Neptune cluster. |
|
Return results of a Gremlin traversal as pandas DataFrame. |
|
Return results of a openCypher traversal as pandas DataFrame. |
|
Return results of a SPARQL query as pandas DataFrame. |
|
Flatten the lists and dictionaries of the input data frame. |
|
Write records stored in a DataFrame into Amazon Neptune. |
|
Write records stored in a DataFrame into Amazon Neptune. |
|
Write records into Amazon Neptune using the Neptune Bulk Loader. |
|
Load files from S3 into Amazon Neptune using the Neptune Bulk Loader. |
DynamoDB¶
|
Delete all items in the specified DynamoDB table. |
|
Run a PartiQL statement against a DynamoDB table. |
|
Get DynamoDB table object for specified table name. |
|
Write all items from a CSV file to a DynamoDB. |
|
Write all items from a DataFrame to a DynamoDB. |
|
Insert all items to the specified DynamoDB table. |
|
Write all items from JSON file to a DynamoDB. |
|
Read items from given DynamoDB table. |
|
Read data from a DynamoDB table via a PartiQL query. |
Amazon Timestream¶
|
Batch load a Pandas DataFrame into a Amazon Timestream table. |
|
Batch load files from S3 into a Amazon Timestream table. |
|
Create a new Timestream database. |
|
Create a new Timestream database. |
|
Delete a given Timestream database. |
|
Delete a given Timestream table. |
|
List all databases in timestream. |
|
List tables in timestream. |
|
Run a query and retrieve the result as a Pandas DataFrame. |
|
Wait for the Timestream batch load task to complete. |
|
Store a Pandas DataFrame into an Amazon Timestream table. |
|
Unload query results to Amazon S3. |
|
Unload query results to Amazon S3 and read the results as Pandas Data Frame. |
AWS Clean Rooms¶
|
Execute Clean Rooms Protected SQL query and return the results as a Pandas DataFrame. |
|
Wait for the Clean Rooms protected query to end. |
Amazon EMR¶
|
Build the Step structure (dictionary). |
|
Build the Step structure (dictionary). |
|
Create a EMR cluster with instance fleets configuration. |
|
Get the EMR cluster state. |
|
Get EMR step state. |
|
Update internal ECR credentials. |
|
Submit Spark Step. |
|
Submit new job in the EMR Cluster. |
|
Submit a list of steps. |
|
Terminate EMR cluster. |
Amazon EMR Serverless¶
|
Create an EMR Serverless application. |
|
Run an EMR serverless job. |
|
Wait for the EMR Serverless job to finish. |
Amazon CloudWatch Logs¶
|
Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. |
|
Run a query against AWS CloudWatchLogs Insights and wait the results. |
|
Run a query against AWS CloudWatchLogs Insights. |
|
Wait query ends. |
|
List the log streams for the specified log group, return results as a Pandas DataFrame. |
|
List log events from the specified log group. |
Amazon QuickSight¶
|
Cancel an ongoing ingestion of data into SPICE. |
|
Create a QuickSight data source pointing to an Athena/Workgroup. |
|
Create a QuickSight dataset. |
|
Create and starts a new SPICE ingestion on a dataset. |
|
Delete all dashboards. |
|
Delete all data sources. |
|
Delete all datasets. |
|
Delete all templates. |
|
Delete a dashboard. |
|
Delete a data source. |
|
Delete a dataset. |
|
Delete a template. |
|
Describe a QuickSight dashboard by name or ID. |
|
Describe a QuickSight data source by name or ID. |
|
Describe a QuickSight data source permissions by name or ID. |
|
Describe a QuickSight dataset by name or ID. |
|
Describe a QuickSight ingestion by ID. |
|
Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. |
|
Get QuickSight dashboard IDs given a name. |
|
Get QuickSight data source ARN given a name and fails if there is more than 1 ARN associated with this name. |
|
Get QuickSight Data source ARNs given a name. |
|
Get QuickSight data source ID given a name and fails if there is more than 1 ID associated with this name. |
|
Get QuickSight data source IDs given a name. |
|
Get QuickSight Dataset ID given a name and fails if there is more than 1 ID associated with this name. |
|
Get QuickSight dataset IDs given a name. |
|
Get QuickSight template ID given a name and fails if there is more than 1 ID associated with this name. |
|
Get QuickSight template IDs given a name. |
|
List dashboards in an AWS account. |
|
List all QuickSight Data sources summaries. |
|
List all QuickSight datasets summaries. |
|
List all QuickSight Groups. |
|
List all QuickSight Group memberships. |
|
List IAM policy assignments in the current Amazon QuickSight account. |
|
List all the IAM policy assignments. |
|
List the history of SPICE ingestions for a dataset. |
|
List all QuickSight templates. |
|
Return a list of all of the Amazon QuickSight users belonging to this account. |
|
List the Amazon QuickSight groups that an Amazon QuickSight user is a member of. |
AWS STS¶
|
Get Account ID. |
|
Get current user/role ARN. |
|
Get current user/role name. |
AWS Secrets Manager¶
|
Get secret value. |
|
Get JSON secret value. |
Amazon Chime¶
|
Send message on an existing Chime Chat rooms. |
Typing¶
Typed dictionary defining the settings for the Glue table. |
|
Typed dictionary defining the settings for using CTAS (Create Table As Statement). |
|
Typed dictionary defining the settings for using UNLOAD. |
|
Typed dictionary defining the settings for using cached Athena results. |
|
Typed dictionary defining the settings for Athena Partition Projection. |
|
Report configuration for a batch load task. |
|
Configuration for Arrow file decrypting. |
|
Configuration for Arrow file encrypting. |
|
Typed dictionary defining the settings for distributing calls using Ray. |
|
Typed dictionary defining the settings for distributing reading calls using Ray. |
|
Typed dictionary defining the dictionary returned by S3 write functions. |
|
|
Named tuple defining the return value of the |
Global Configurations¶
|
Reset one or all (if None is received) configuration values. |
Load all configurations on a Pandas DataFrame. |
Engine and Memory Format¶
|
Execution engine configuration class. |
Memory format configuration class. |
Distributed - Ray¶
|
Connect to an existing Ray cluster or start one and connect to it. |