awswrangler.opensearch.index_csv¶
- awswrangler.opensearch.index_csv(client: opensearchpy.OpenSearch, path: str, index: str, doc_type: str | None = None, pandas_kwargs: dict[str, Any] | None = None, use_threads: bool | int = False, **kwargs: Any) Any ¶
Index all documents from a CSV file to OpenSearch index.
- Parameters:
client (OpenSearch) – instance of opensearchpy.OpenSearch to use.
path (str) – s3 or local path to the CSV file which contains the documents.
index (str) – Name of the index.
doc_type (str, optional) – Name of the document type (for Elasticsearch versions 5.x and earlier).
pandas_kwargs (Dict[str, Any], optional) – Dictionary of arguments forwarded to pandas.read_csv(). e.g. pandas_kwargs={‘sep’: ‘|’, ‘na_values’: [‘null’, ‘none’]} https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html Note: these params values are enforced: skip_blank_lines=True
use_threads (bool, int) – True to enable concurrent requests, False to disable multiple threads. If enabled os.cpu_count() will be used as the max number of threads. If integer is provided, specified number is used.
**kwargs – KEYWORD arguments forwarded to
index_documents()
which is used to execute the operation
- Returns:
Response payload https://opensearch.org/docs/opensearch/rest-api/document-apis/bulk/#response.
- Return type:
Dict[str, Any]
Examples
Writing contents of CSV file
>>> import awswrangler as wr >>> client = wr.opensearch.connect(host='DOMAIN-ENDPOINT') >>> wr.opensearch.index_csv( ... client=client, ... path='docs.csv', ... index='sample-index1' ... )
Writing contents of CSV file using pandas_kwargs
>>> import awswrangler as wr >>> client = wr.opensearch.connect(host='DOMAIN-ENDPOINT') >>> wr.opensearch.index_csv( ... client=client, ... path='docs.csv', ... index='sample-index1', ... pandas_kwargs={'sep': '|', 'na_values': ['null', 'none']} ... )