awswrangler.dynamodb.read_items¶
- awswrangler.dynamodb.read_items(table_name: str, index_name: str | None = None, partition_values: Sequence[Any] | None = None, sort_values: Sequence[Any] | None = None, filter_expression: ConditionBase | str | None = None, key_condition_expression: ConditionBase | str | None = None, expression_attribute_names: dict[str, str] | None = None, expression_attribute_values: dict[str, Any] | None = None, consistent: bool = False, columns: Sequence[str] | None = None, allow_full_scan: bool = False, max_items_evaluated: int | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', as_dataframe: bool = True, chunked: bool = False, use_threads: bool | int = True, boto3_session: boto3.Session | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) pd.DataFrame | Iterator[pd.DataFrame] | _ItemsListType | Iterator[_ItemsListType] ¶
Read items from given DynamoDB table.
This function aims to gracefully handle (some of) the complexity of read actions available in Boto3 towards a DynamoDB table, abstracting it away while providing a single, unified entry point.
Under the hood, it wraps all the four available read actions: get_item, batch_get_item, query and scan.
Warning
To avoid a potentially costly Scan operation, please make sure to pass arguments such as partition_values or max_items_evaluated. Note that filter_expression is applied AFTER a Scan
Note
Number of Parallel Scan segments is based on the use_threads argument. A parallel scan with a large number of workers could consume all the provisioned throughput of the table or index. See: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan
Note
If max_items_evaluated is specified, then use_threads=False is enforced. This is because it’s not possible to limit the number of items in a Query/Scan operation across threads.
Note
Following arguments are not supported in distributed mode with engine EngineEnum.RAY:
boto3_session
dtype_backend
- Parameters:
table_name (str) – DynamoDB table name.
index_name (str | None) – Name of the secondary global or local index on the table. Defaults to None.
partition_values (Sequence[Any] | None) – Partition key values to retrieve. Defaults to None.
sort_values (Sequence[Any] | None) – Sort key values to retrieve. Defaults to None.
filter_expression (ConditionBase | str | None) – Filter expression as string or combinations of boto3.dynamodb.conditions.Attr conditions. Defaults to None.
key_condition_expression (ConditionBase | str | None) – Key condition expression as string or combinations of boto3.dynamodb.conditions.Key conditions. Defaults to None.
expression_attribute_names (dict[str, str] | None) – Mapping of placeholder and target attributes. Defaults to None.
expression_attribute_values (dict[str, Any] | None) – Mapping of placeholder and target values. Defaults to None.
consistent (bool) – If True, ensure that the performed read operation is strongly consistent, otherwise eventually consistent. Defaults to False.
columns (Sequence[str] | None) – Attributes to retain in the returned items. Defaults to None (all attributes).
allow_full_scan (bool) – If True, allow full table scan without any filtering. Defaults to False.
max_items_evaluated (int | None) – Limit the number of items evaluated in case of query or scan operations. Defaults to None (all matching items). When set, use_threads is enforced to False.
dtype_backend (Literal[‘numpy_nullable’, ‘pyarrow’]) –
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.
The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.
as_dataframe (bool) – If True, return items as pd.DataFrame, otherwise as list/dict. Defaults to True.
chunked (bool) – If True an iterable of DataFrames/lists is returned. False by default.
use_threads (bool | int) – Used for Parallel Scan requests. True (default) to enable concurrency, False to disable multiple threads. If enabled os.cpu_count() is used as the max number of threads. If integer is provided, specified number is used.
boto3_session (boto3.Session | None) – The default boto3 session will be used if boto3_session is
None
.pyarrow_additional_kwargs (dict[str, Any] | None) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.
- Raises:
exceptions.InvalidArgumentType – When the specified table has also a sort key but only the partition values are specified.
exceptions.InvalidArgumentCombination – When both partition and sort values sequences are specified but they have different lengths, or when provided parameters are not enough informative to proceed with a read operation.
- Returns:
A Data frame containing the retrieved items, or a dictionary of returned items. Alternatively, the return type can be an iterable of either type when chunked=True.
- Return type:
pd.DataFrame | list[dict[str, Any]] | Iterable[pd.DataFrame] | Iterable[list[dict[str, Any]]]
Examples
Reading 5 random items from a table
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items(table_name='my-table', max_items_evaluated=5)
Strongly-consistent reading of a given partition value from a table
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items(table_name='my-table', partition_values=['my-value'], consistent=True)
Reading items pairwise-identified by partition and sort values, from a table with a composite primary key
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... partition_values=['pv_1', 'pv_2'], ... sort_values=['sv_1', 'sv_2'] ... )
Reading items while retaining only specified attributes, automatically handling possible collision with DynamoDB reserved keywords
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... partition_values=['my-value'], ... columns=['connection', 'other_col'] # connection is a reserved keyword, managed under the hood! ... )
Reading all items from a table explicitly allowing full scan
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items(table_name='my-table', allow_full_scan=True)
Reading items matching a KeyConditionExpression expressed with boto3.dynamodb.conditions.Key
>>> import awswrangler as wr >>> from boto3.dynamodb.conditions import Key >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... key_condition_expression=(Key('key_1').eq('val_1') & Key('key_2').eq('val_2')) ... )
Same as above, but with KeyConditionExpression as string
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... key_condition_expression='key_1 = :v1 and key_2 = :v2', ... expression_attribute_values={':v1': 'val_1', ':v2': 'val_2'}, ... )
Reading items matching a FilterExpression expressed with boto3.dynamodb.conditions.Attr Note that FilterExpression is applied AFTER a Scan operation
>>> import awswrangler as wr >>> from boto3.dynamodb.conditions import Attr >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... filter_expression=Attr('my_attr').eq('this-value') ... )
Same as above, but with FilterExpression as string
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... filter_expression='my_attr = :v', ... expression_attribute_values={':v': 'this-value'} ... )
Reading items involving an attribute which collides with DynamoDB reserved keywords
>>> import awswrangler as wr >>> df = wr.dynamodb.read_items( ... table_name='my-table', ... filter_expression='#operator = :v', ... expression_attribute_names={'#operator': 'operator'}, ... expression_attribute_values={':v': 'this-value'} ... )