2023-04-12 10:00:00+00:00

When running a price crawler that fetches updates for millions of parts daily, writing these updates to your database can quickly bottleneck. If your DynamoDB table is not configured correctly, you will hit ProvisionedThroughputExceededException errors, even if your total Write Capacity Units (WCUs) are high.

To persist data efficiently, you must distribute writes evenly across partition keys and leverage write batching.


1. Avoiding Hot Partitions

DynamoDB distributes data across physical partitions based on the hash of the Partition Key. If your crawler writes thousands of updates for the same part number consecutively, all writes hit the same physical partition (hot partition), leading to throttling. We solve this by introducing random salt prefixes to partition keys or ordering the write queue randomly to distribute writes across partitions.

2. Leveraging Batch Writing

Writing items individually incurs massive HTTP network overhead. Instead, we use BatchWriteItem to bundle up to 25 write requests into a single API call, reducing latency and utilizing WCUs efficiently:

# Batch writing to DynamoDB using Boto3
def batch_write_parts(table_name, items):
    with db_resource.Table(table_name).batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
            # Boto3 handles unprocessed items and retries automatically

Combining batch writing with proper partition key selection allows us to import millions of daily prices with minimal write throttle rates.