When you update NoSQL database models to add search index rules (such as indexing a previously excluded field), Cloud Datastore does not index historical entities automatically. The new index is only applied to documents created or modified *after* the change was pushed. To apply the index to legacy data, you must rewrite every existing document in the table. If you run updates on a large dataset synchronously, your script will hit execution timeouts.
By writing cursor-based batch migrators in Python, we can safely reindex millions of documents in the background.
1. Implementing the Cursor-Based Batch Reindexer
We write a background migration script that queries entities in batches of 1,000 using database cursors to track progress:
# batch_reindexer.py
from google.cloud import ndb
import time
class LegacyData(ndb.Model):
old_field = ndb.StringProperty()
# Newly indexed field
category = ndb.StringProperty(indexed=True)
def reindex_batch(cursor=None, batch_size=1000):
client = ndb.Client()
with client.context():
# Query for entities that need reindexing
query = LegacyData.query()
entities, next_cursor, more = query.fetch_page(batch_size, start_cursor=cursor)
if not entities:
print("Reindexing complete!")
return
# Trigger an update on each entity to force index calculations
for entity in entities:
# We rewrite the entity to trigger index updates
entity.put()
print(f"Reindexed {len(entities)} records.")
if more and next_cursor:
# Sleep briefly to avoid database write rate limits
time.sleep(0.5)
# Recurse with the next cursor
reindex_batch(next_cursor, batch_size)
2. Safe Migration Loops
By leveraging cursor objects, the migration can be safely paused and resumed between runs, preventing server timeouts and maintaining ingestion speed.