awswrangler.s3.query_vectors

awswrangler.s3.query_vectors(*, query_vector: list[float] | ndarray[Any, Any] | None = None, query_text: str | None = None, top_k: int = 10, filter: dict[str, Any] | None = None, return_distance: bool = True, return_metadata: bool = True, bedrock_model_id: str | None = None, bedrock_model_kwargs: dict[str, Any] | None = None, vector_bucket: str | None = None, vector_bucket_arn: str | None = None, index: str | None = None, index_arn: str | None = None, boto3_session: Session | None = None) DataFrame

Approximate-nearest-neighbour query against an Amazon S3 Vectors index.

Parameters:
  • query_vector (list[float] | ndarray[Any, Any] | None) – Pre-computed query embedding.

  • query_text (str | None) – Text to embed via Bedrock (requires bedrock_model_id).

  • top_k (int) – Number of nearest neighbours to return (1-100).

  • filter (dict[str, Any] | None) – Metadata filter (MongoDB-style operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $and, $or).

  • return_distance (bool) – Whether to include each result’s distance and metadata.

  • return_metadata (bool) – Whether to include each result’s distance and metadata.

  • bedrock_model_id (str | None) – Bedrock embedding configuration when using query_text.

  • bedrock_model_kwargs (dict[str, Any] | None) – Bedrock embedding configuration when using query_text.

  • index_arn (vector_bucket / vector_bucket_arn / index /) – Target index.

  • boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.

Return type:

DataFrame

Returns:

DataFrame with columns key and (optionally) distance, metadata. The configured distance metric is exposed via df.attrs['distance_metric'].

Examples

>>> import awswrangler as wr
>>> df = wr.s3.query_vectors(
...     query_vector=[0.1, 0.2, 0.3],
...     top_k=5,
...     filter={"genre": {"$eq": "documentary"}},
...     vector_bucket="my-bucket",
...     index="my-index",
... )