awswrangler.s3.list_vectors

awswrangler.s3.list_vectors(*, return_data: bool = False, return_metadata: bool = False, max_items: int | None = None, chunked: bool | int = False, vector_bucket: str | None = None, vector_bucket_arn: str | None = None, index: str | None = None, index_arn: str | None = None, use_threads: bool | int = True, boto3_session: Session | None = None) DataFrame | Iterator[DataFrame]

List all vectors in an index. Uses parallel segments (up to 16) when use_threads enables it.

Parameters:
  • return_data (bool) – Whether to include each vector’s data and metadata.

  • return_metadata (bool) – Whether to include each vector’s data and metadata.

  • max_items (int | None) – Optional cap on total vectors returned across all pages/segments.

  • chunked (bool | int) –

    Batching (memory-friendly). Returns an iterator of DataFrames instead of one frame:

    • True — yield one DataFrame per underlying API page.

    • INTEGER — yield DataFrames of exactly this many rows (final frame may be shorter).

    Chunked streaming is single-segment (sequential) regardless of use_threads.

  • index_arn (vector_bucket / vector_bucket_arn / index /) – Target index.

  • use_threads (bool | int) – Concurrency for parallel-segment listing. Ignored when chunked is truthy.

  • boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.

Return type:

DataFrame | Iterator[DataFrame]

Returns:

DataFrame with columns key and (optionally) vector, metadata — or an iterator of such DataFrames when chunked is truthy.