awswrangler.s3.put_vectors_from_df¶
- awswrangler.s3.put_vectors_from_df(df: DataFrame, *, key_column: str, vector_column: str | None = None, metadata_columns: list[str] | None = None, text_column: str | None = None, bedrock_model_id: str | None = None, bedrock_model_kwargs: dict[str, Any] | None = None, vector_bucket: str | None = None, vector_bucket_arn: str | None = None, index: str | None = None, index_arn: str | None = None, use_threads: bool | int = True, boto3_session: Session | None = None) None¶
Insert all rows of a DataFrame into an Amazon S3 Vectors index.
Either
vector_column(precomputed embeddings) ortext_column+bedrock_model_id(embed via Amazon Bedrock on the fly) must be provided.- Parameters:
df (
DataFrame) – Input DataFrame.key_column (
str) – Column containing the per-row vector key (string).vector_column (
str|None) – Column containing the precomputed embedding (list[float] / np.ndarray per row).metadata_columns (
list[str] |None) – Columns to attach as filterable/non-filterable metadata.Nonemeans “all columns exceptkey_column,vector_columnandtext_column” — notetext_columnis excluded by default; pass it explicitly here (e.g. for RAG citations) to keep it. NaN /pd.NA/Nonecells are dropped per row.text_column (
str|None) – Column containing input text to embed via Bedrock. Mutually exclusive withvector_column.bedrock_model_id (
str|None) – Bedrock embedding model and optional model-specific kwargs (e.g.{"dimensions": 256}).bedrock_model_kwargs (
dict[str,Any] |None) – Bedrock embedding model and optional model-specific kwargs (e.g.{"dimensions": 256}).index_arn (vector_bucket / vector_bucket_arn / index /) – Target index.
use_threads (
bool|int) – Concurrency for batched put calls and for parallel Bedrock embedding.boto3_session (
Session|None) – The default boto3 session will be used if boto3_session isNone.
- Return type:
None
Examples
Pre-computed vectors:
>>> import awswrangler as wr >>> wr.s3.put_vectors_from_df( ... df=my_df, ... key_column="id", ... vector_column="embedding", ... vector_bucket="my-bucket", ... index="my-index", ... )
Embed-on-write via Bedrock Titan:
>>> wr.s3.put_vectors_from_df( ... df=my_df, ... key_column="id", ... text_column="content", ... bedrock_model_id="amazon.titan-embed-text-v2:0", ... vector_bucket="my-bucket", ... index="my-index", ... )