Tuning Cloud Datastore Index Writes: Excluding Properties to Reduce Write Latency

2020-05-19 11:08:34+00:00

In high-throughput databases built on Cloud Datastore, database write operations represent a significant portion of both transaction latency and cloud billing costs. When you save a Datastore entity, Google Cloud automatically creates search indexes for every property by default. For a document with 15 properties, saving the document triggers dozens of write operations to update these indexes, compounding costs and slowing down processing rates. To scale ingestion pipelines, developers must configure property exclusion rules.

By excluding non-indexed properties, you can reduce write latency, avoid hot-spotting, and decrease Cloud Datastore costs.

1. Understanding Index Write Multipliers

When saving a document, the total write operations follow this calculation:

Total Writes = 2 (Entity write) + 2 * (Number of indexed properties) + 4 * (Number of composite indexes)

By marking fields that are never used in query filters or sorting operations (like raw JSON configs, serial packets, or description texts) as unindexed, we eliminate index write overhead completely.

2. Configuring Exclusions in NDB Models

In python, we configure the database properties explicitly using the indexed=False parameter inside our model definitions, telling Datastore to skip indexing these properties:

# models.py
from google.cloud import ndb

class EdgeDeviceTelemetry(ndb.Model):
    device_id = ndb.StringProperty(required=True)  # Indexed (used in query filters)
    timestamp = ndb.DateTimeProperty(required=True) # Indexed (used in sorting)
    
    # Exclude heavy payloads from database index structures
    raw_payload = ndb.TextProperty(indexed=False)
    syslog_log = ndb.TextProperty(indexed=False)
    hex_dump = ndb.StringProperty(indexed=False)
    
    # Complex config JSON strings
    configuration_json = ndb.JsonProperty(indexed=False)

3. Ingestion Performance Gains

Excluding these properties prevents Datastore from writing secondary index records. In test runs, removing indexation from 4 heavy string properties dropped the average database write latency from 85ms down to 22ms, increasing the throughput of parallel background worker queues and preventing transaction write limits from throttling connections.