Extracting, Transforming, and Loading (ETL) files from external partners typically requires running background batch scripts. Setting up dedicated server instances to run these scripts is costly, especially if the ingestion jobs only run once a day. Serverless Cloud Functions provide an efficient alternative: they execute code in response to events (such as file uploads to Cloud Storage) and scale down to zero when idle, minimizing operating costs.
By writing serverless functions in Python, you can build event-driven ETL pipelines that ingest files automatically.
1. Processing Uploaded Files via Cloud Functions
We write a Python function that is triggered when a CSV file is uploaded to Cloud Storage, parsing the file and saving records to a relational database:
# main.py
import csv
import tempfile
from google.cloud import storage
import sqlalchemy
engine = sqlalchemy.create_engine("postgresql://user:pass@localhost/db")
def handle_csv_upload(data, context):
"""Cloud Function triggered by file uploads to Cloud Storage"""
bucket_name = data['bucket']
file_name = data['name']
# Download file to temp storage
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(file_name)
_, temp_local_filename = tempfile.mkstemp()
blob.download_to_filename(temp_local_filename)
# Parse CSV contents
with open(temp_local_filename, mode='r') as f:
reader = csv.DictReader(f)
for row in reader:
# Process and write records
write_to_db(row)
2. Scaling Serverless Ingestions
Since Cloud Functions execute in response to storage uploads, the ingestion pipeline handles hundreds of concurrent file uploads without manual server provisioning, ensuring scalability.