awswrangler.opensearch.index_json

awswrangler.opensearch.index_json(client: opensearchpy.OpenSearch, path: str, index: str, doc_type: str | None = None, boto3_session: boto3.Session | None = Session(region_name=None), json_path: str | None = None, use_threads: bool | int = False, **kwargs: Any) Any

Index all documents from JSON file to OpenSearch index.

The JSON file should be in a JSON-Lines text format (newline-delimited JSON) - https://jsonlines.org/ OR if the is a single large JSON please provide json_path.

Parameters:
  • client (OpenSearch) – instance of opensearchpy.OpenSearch to use.

  • path (str) – s3 or local path to the JSON file which contains the documents.

  • index (str) – Name of the index.

  • doc_type (str, optional) – Name of the document type (for Elasticsearch versions 5.x and earlier).

  • json_path (str, optional) – JsonPath expression to specify explicit path to a single name element in a JSON hierarchical data structure. Read more about JsonPath

  • boto3_session (boto3.Session(), optional) – Boto3 Session to be used to access s3 if s3 path is provided. The default boto3 Session will be used if boto3_session receive None.

  • use_threads (bool, int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.

  • **kwargs – KEYWORD arguments forwarded to index_documents() which is used to execute the operation

Returns:

Response payload https://opensearch.org/docs/opensearch/rest-api/document-apis/bulk/#response.

Return type:

Dict[str, Any]

Examples

Writing contents of JSON file

>>> import awswrangler as wr
>>> client = wr.opensearch.connect(host='DOMAIN-ENDPOINT')
>>> wr.opensearch.index_json(
...     client=client,
...     path='docs.json',
...     index='sample-index1'
... )