awswrangler.dynamodb.read_items¶

awswrangler.dynamodb.read_items(table_name: str, index_name: str | None = None, partition_values: Sequence[Any] | None = None, sort_values: Sequence[Any] | None = None, filter_expression: ConditionBase | str | None = None, key_condition_expression: ConditionBase | str | None = None, expression_attribute_names: dict[str, str] | None = None, expression_attribute_values: dict[str, Any] | None = None, consistent: bool = False, columns: Sequence[str] | None = None, allow_full_scan: bool = False, max_items_evaluated: int | None = None, dtype_backend: Literal['numpy_nullable', 'pyarrow'] = 'numpy_nullable', as_dataframe: bool = True, chunked: bool = False, use_threads: bool | int = True, boto3_session: boto3.Session | None = None, pyarrow_additional_kwargs: dict[str, Any] | None = None) → pd.DataFrame | Iterator[pd.DataFrame] | _ItemsListType | Iterator[_ItemsListType]¶

Read items from given DynamoDB table.

This function aims to gracefully handle (some of) the complexity of read actions available in Boto3 towards a DynamoDB table, abstracting it away while providing a single, unified entry point.

Under the hood, it wraps all the four available read actions: get_item, batch_get_item, query and scan.

Warning

To avoid a potentially costly Scan operation, please make sure to pass arguments such as partition_values or max_items_evaluated. Note that filter_expression is applied AFTER a Scan

Note

Number of Parallel Scan segments is based on the use_threads argument. A parallel scan with a large number of workers could consume all the provisioned throughput of the table or index. See: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan

Note

If max_items_evaluated is specified, then use_threads=False is enforced. This is because it’s not possible to limit the number of items in a Query/Scan operation across threads.

Note

Following arguments are not supported in distributed mode with engine EngineEnum.RAY:

boto3_session
dtype_backend

Parameters:

table_name (str) – DynamoDB table name.
index_name (str | None) – Name of the secondary global or local index on the table. Defaults to None.
partition_values (Sequence[Any] | None) – Partition key values to retrieve. Defaults to None.
sort_values (Sequence[Any] | None) – Sort key values to retrieve. Defaults to None.
filter_expression (ConditionBase | str | None) – Filter expression as string or combinations of boto3.dynamodb.conditions.Attr conditions. Defaults to None.
key_condition_expression (ConditionBase | str | None) – Key condition expression as string or combinations of boto3.dynamodb.conditions.Key conditions. Defaults to None.
expression_attribute_names (dict[str, str] | None) – Mapping of placeholder and target attributes. Defaults to None.
expression_attribute_values (dict[str, Any] | None) – Mapping of placeholder and target values. Defaults to None.
consistent (bool) – If True, ensure that the performed read operation is strongly consistent, otherwise eventually consistent. Defaults to False.
columns (Sequence[str] | None) – Attributes to retain in the returned items. Defaults to None (all attributes).
allow_full_scan (bool) – If True, allow full table scan without any filtering. Defaults to False.
max_items_evaluated (int | None) – Limit the number of items evaluated in case of query or scan operations. Defaults to None (all matching items). When set, use_threads is enforced to False.
dtype_backend (Literal[‘numpy_nullable’, ‘pyarrow’]) –
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.

The dtype_backends are still experimential. The “pyarrow” backend is only supported with Pandas 2.0 or above.
as_dataframe (bool) – If True, return items as pd.DataFrame, otherwise as list/dict. Defaults to True.
chunked (bool) – If True an iterable of DataFrames/lists is returned. False by default.
use_threads (bool | int) – Used for Parallel Scan requests. True (default) to enable concurrency, False to disable multiple threads. If enabled os.cpu_count() is used as the max number of threads. If integer is provided, specified number is used.
boto3_session (boto3.Session | None) – The default boto3 session will be used if boto3_session is None.
pyarrow_additional_kwargs (dict[str, Any] | None) – Forwarded to to_pandas method converting from PyArrow tables to Pandas DataFrame. Valid values include “split_blocks”, “self_destruct”, “ignore_metadata”. e.g. pyarrow_additional_kwargs={‘split_blocks’: True}.

Raises:

exceptions.InvalidArgumentType – When the specified table has also a sort key but only the partition values are specified.
exceptions.InvalidArgumentCombination – When both partition and sort values sequences are specified but they have different lengths, or when provided parameters are not enough informative to proceed with a read operation.

Returns:

A Data frame containing the retrieved items, or a dictionary of returned items. Alternatively, the return type can be an iterable of either type when chunked=True.

Return type:

pd.DataFrame | list[dict[str, Any]] | Iterable[pd.DataFrame] | Iterable[list[dict[str, Any]]]

Examples

Reading 5 random items from a table

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(table_name='my-table', max_items_evaluated=5)

Strongly-consistent reading of a given partition value from a table

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(table_name='my-table', partition_values=['my-value'], consistent=True)

Reading items pairwise-identified by partition and sort values, from a table with a composite primary key

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     partition_values=['pv_1', 'pv_2'],
...     sort_values=['sv_1', 'sv_2']
... )

Reading items while retaining only specified attributes, automatically handling possible collision with DynamoDB reserved keywords

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     partition_values=['my-value'],
...     columns=['connection', 'other_col'] # connection is a reserved keyword, managed under the hood!
... )

Reading all items from a table explicitly allowing full scan

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(table_name='my-table', allow_full_scan=True)

Reading items matching a KeyConditionExpression expressed with boto3.dynamodb.conditions.Key

>>> import awswrangler as wr
>>> from boto3.dynamodb.conditions import Key
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     key_condition_expression=(Key('key_1').eq('val_1') & Key('key_2').eq('val_2'))
... )

Same as above, but with KeyConditionExpression as string

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     key_condition_expression='key_1 = :v1 and key_2 = :v2',
...     expression_attribute_values={':v1': 'val_1', ':v2': 'val_2'},
... )

Reading items matching a FilterExpression expressed with boto3.dynamodb.conditions.Attr Note that FilterExpression is applied AFTER a Scan operation

>>> import awswrangler as wr
>>> from boto3.dynamodb.conditions import Attr
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     filter_expression=Attr('my_attr').eq('this-value')
... )

Same as above, but with FilterExpression as string

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     filter_expression='my_attr = :v',
...     expression_attribute_values={':v': 'this-value'}
... )

Reading items involving an attribute which collides with DynamoDB reserved keywords

>>> import awswrangler as wr
>>> df = wr.dynamodb.read_items(
...     table_name='my-table',
...     filter_expression='#operator = :v',
...     expression_attribute_names={'#operator': 'operator'},
...     expression_attribute_values={':v': 'this-value'}
... )