awswrangler.s3.put_vectors_from_df¶

Insert all rows of a DataFrame into an Amazon S3 Vectors index.

Either vector_column (precomputed embeddings) or text_column + bedrock_model_id (embed via Amazon Bedrock on the fly) must be provided.

Parameters:

df (DataFrame) – Input DataFrame.
key_column (str) – Column containing the per-row vector key (string).
vector_column (str | None) – Column containing the precomputed embedding (list[float] / np.ndarray per row).
metadata_columns (list[str] | None) – Columns to attach as filterable/non-filterable metadata. None means “all columns except key_column, vector_column and text_column” — note text_column is excluded by default; pass it explicitly here (e.g. for RAG citations) to keep it. NaN / pd.NA / None cells are dropped per row.
text_column (str | None) – Column containing input text to embed via Bedrock. Mutually exclusive with vector_column.
bedrock_model_id (str | None) – Bedrock embedding model and optional model-specific kwargs (e.g. {"dimensions": 256}).
bedrock_model_kwargs (dict[str, Any] | None) – Bedrock embedding model and optional model-specific kwargs (e.g. {"dimensions": 256}).
index_arn (vector_bucket / vector_bucket_arn / index /) – Target index.
use_threads (bool | int) – Concurrency for batched put calls and for parallel Bedrock embedding.
boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.

Return type:

None

Examples

Pre-computed vectors:

>>> import awswrangler as wr
>>> wr.s3.put_vectors_from_df(
...     df=my_df,
...     key_column="id",
...     vector_column="embedding",
...     vector_bucket="my-bucket",
...     index="my-index",
... )

Embed-on-write via Bedrock Titan:

>>> wr.s3.put_vectors_from_df(
...     df=my_df,
...     key_column="id",
...     text_column="content",
...     bedrock_model_id="amazon.titan-embed-text-v2:0",
...     vector_bucket="my-bucket",
...     index="my-index",
... )