awswrangler.neptune.bulk_load

awswrangler.neptune.bulk_load(client: NeptuneClient, df: DataFrame, path: str, iam_role: str, neptune_load_wait_polling_delay: float = 0.25, load_parallelism: Literal['LOW', 'MEDIUM', 'HIGH', 'OVERSUBSCRIBE'] = 'HIGH', parser_configuration: BulkLoadParserConfiguration | None = None, update_single_cardinality_properties: Literal['TRUE', 'FALSE'] = 'FALSE', queue_request: Literal['TRUE', 'FALSE'] = 'FALSE', dependencies: list[str] | None = None, keep_files: bool = False, use_threads: bool | int = True, boto3_session: Session | None = None, s3_additional_kwargs: dict[str, str] | None = None) None

Write records into Amazon Neptune using the Neptune Bulk Loader.

The DataFrame will be written to S3 and then loaded to Neptune using the Bulk Loader.

Note

This function has arguments which can be configured globally through wr.config or environment variables:

  • neptune_load_wait_polling_delay

Check out the Global Configurations Tutorial for details.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

  • boto3_session

  • s3_additional_kwargs

Parameters:
  • client (NeptuneClient) – Instance of the neptune client to use

  • df (DataFrame, optional) – Pandas DataFrame to write to Neptune.

  • path (str) – S3 Path that the Neptune Bulk Loader will load data from.

  • iam_role (str) – The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.

  • neptune_load_wait_polling_delay (float) – Interval in seconds for how often the function will check if the Neptune bulk load has completed.

  • load_parallelism (str) – Specifies the number of threads used by Neptune’s bulk load process.

  • parser_configuration (dict[str, Any], optional) – An optional object with additional parser configuration values. Each of the child parameters is also optional: namedGraphUri, baseUri and allowEmptyStrings.

  • update_single_cardinality_properties (str) – An optional parameter that controls how the bulk loader treats a new value for single-cardinality vertex or edge properties.

  • queue_request (str) –

    An optional flag parameter that indicates whether the load request can be queued up or not.

    If omitted or set to "FALSE", the load request will fail if another load job is already running.

  • dependencies (list[str], optional) – An optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.

  • keep_files (bool) – Whether to keep stage files or delete them. False by default.

  • use_threads (bool | int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.

  • boto3_session (boto3.Session(), optional) – Boto3 Session. The default boto3 session will be used if boto3_session receive None.

  • s3_additional_kwargs (Dict[str, str], optional) – Forwarded to botocore requests. e.g. s3_additional_kwargs={'ServerSideEncryption': 'aws:kms', 'SSEKMSKeyId': 'YOUR_KMS_KEY_ARN'}

Examples

>>> import awswrangler as wr
>>> import pandas as pd
>>> client = wr.neptune.connect("MY_NEPTUNE_ENDPOINT", 8182)
>>> frame = pd.DataFrame([{"~id": "0", "~labels": ["version"], "~properties": {"type": "version"}}])
>>> wr.neptune.bulk_load(
...     client=client,
...     df=frame,
...     path="s3://my-bucket/stage-files/",
...     iam_role="arn:aws:iam::XXX:role/XXX"
... )