awswrangler.s3.query_vectors¶
- awswrangler.s3.query_vectors(*, query_vector: list[float] | ndarray[Any, Any] | None = None, query_text: str | None = None, top_k: int = 10, filter: dict[str, Any] | None = None, return_distance: bool = True, return_metadata: bool = True, bedrock_model_id: str | None = None, bedrock_model_kwargs: dict[str, Any] | None = None, vector_bucket: str | None = None, vector_bucket_arn: str | None = None, index: str | None = None, index_arn: str | None = None, boto3_session: Session | None = None) DataFrame¶
Approximate-nearest-neighbour query against an Amazon S3 Vectors index.
- Parameters:
query_vector (
list[float] |ndarray[Any,Any] |None) – Pre-computed query embedding.query_text (
str|None) – Text to embed via Bedrock (requiresbedrock_model_id).top_k (
int) – Number of nearest neighbours to return (1-100).filter (
dict[str,Any] |None) – Metadata filter (MongoDB-style operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $and, $or).return_distance (
bool) – Whether to include each result’s distance and metadata.return_metadata (
bool) – Whether to include each result’s distance and metadata.bedrock_model_id (
str|None) – Bedrock embedding configuration when usingquery_text.bedrock_model_kwargs (
dict[str,Any] |None) – Bedrock embedding configuration when usingquery_text.index_arn (vector_bucket / vector_bucket_arn / index /) – Target index.
boto3_session (
Session|None) – The default boto3 session will be used if boto3_session isNone.
- Return type:
DataFrame- Returns:
DataFrame with columns
keyand (optionally)distance,metadata. The configured distance metric is exposed viadf.attrs['distance_metric'].
Examples
>>> import awswrangler as wr >>> df = wr.s3.query_vectors( ... query_vector=[0.1, 0.2, 0.3], ... top_k=5, ... filter={"genre": {"$eq": "documentary"}}, ... vector_bucket="my-bucket", ... index="my-index", ... )