awswrangler.s3.put_vectors_from_df

awswrangler.s3.put_vectors_from_df(df: DataFrame, *, key_column: str, vector_column: str | None = None, metadata_columns: list[str] | None = None, text_column: str | None = None, bedrock_model_id: str | None = None, bedrock_model_kwargs: dict[str, Any] | None = None, vector_bucket: str | None = None, vector_bucket_arn: str | None = None, index: str | None = None, index_arn: str | None = None, use_threads: bool | int = True, boto3_session: Session | None = None) None

Insert all rows of a DataFrame into an Amazon S3 Vectors index.

Either vector_column (precomputed embeddings) or text_column + bedrock_model_id (embed via Amazon Bedrock on the fly) must be provided.

Parameters:
  • df (DataFrame) – Input DataFrame.

  • key_column (str) – Column containing the per-row vector key (string).

  • vector_column (str | None) – Column containing the precomputed embedding (list[float] / np.ndarray per row).

  • metadata_columns (list[str] | None) – Columns to attach as filterable/non-filterable metadata. None means “all columns except key_column, vector_column and text_column” — note text_column is excluded by default; pass it explicitly here (e.g. for RAG citations) to keep it. NaN / pd.NA / None cells are dropped per row.

  • text_column (str | None) – Column containing input text to embed via Bedrock. Mutually exclusive with vector_column.

  • bedrock_model_id (str | None) – Bedrock embedding model and optional model-specific kwargs (e.g. {"dimensions": 256}).

  • bedrock_model_kwargs (dict[str, Any] | None) – Bedrock embedding model and optional model-specific kwargs (e.g. {"dimensions": 256}).

  • index_arn (vector_bucket / vector_bucket_arn / index /) – Target index.

  • use_threads (bool | int) – Concurrency for batched put calls and for parallel Bedrock embedding.

  • boto3_session (Session | None) – The default boto3 session will be used if boto3_session is None.

Return type:

None

Examples

Pre-computed vectors:

>>> import awswrangler as wr
>>> wr.s3.put_vectors_from_df(
...     df=my_df,
...     key_column="id",
...     vector_column="embedding",
...     vector_bucket="my-bucket",
...     index="my-index",
... )

Embed-on-write via Bedrock Titan:

>>> wr.s3.put_vectors_from_df(
...     df=my_df,
...     key_column="id",
...     text_column="content",
...     bedrock_model_id="amazon.titan-embed-text-v2:0",
...     vector_bucket="my-bucket",
...     index="my-index",
... )