In marketing intelligence, comparing different campaigns or products across standard AIDA metrics (Awareness, Interest, Desire, Action) is essential. However, finding historical "comparable comps" is hard because different campaigns operate at vastly different scales (e.g. comparing a massive global launch with a niche release). To compare them accurately, we must normalize the data and find nearest neighbors algorithmically.
In this post, we'll design an automated comp-engine in FastAPI using scikit-learn's StandardScaler and NearestNeighbors to find the closest historical matches in AIDA space.
The Comp-Engine Architecture
To build a robust mathematical search engine, our FastAPI service intercepts incoming marketing metrics and executes a clean scikit-learn pipeline:
- Data Ingestion: Accepts a target campaign name and a list of historical campaigns.
- Feature Standardization: Uses
StandardScalerto center and scale features (Awareness, Interest, Desire) to a standard normal distribution. This is critical because if one metric has a scale of 0-1,000,000 and another is 0-10, the larger scale will dominate the distance metric completely. - Euclidean Proximity Search: Fits a
NearestNeighborsindex using Euclidean distance to find theKclosest neighbors.
Step 1: Standardizing Features and Calculating KNN
We'll create a dedicated Python helper class, AidaKNN, to handle standardizing dataframes and querying the index:
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
import logging
class AidaKNN:
def __init__(self, data: list[dict], target_model: str, k: int = 4):
self.data = data
self.target_model = target_model
self.k = k
self.features = ["awareness", "interest", "desire"]
def calculate(self) -> dict | None:
df = pd.DataFrame(self.data)
# Verify required columns exist
if "model" not in df.columns or not all(f in df.columns for f in self.features):
logging.error("Missing required columns in input dataset")
return None
# Standardize features (Mean=0, StdDev=1)
X = df[self.features].values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Get target index
if self.target_model not in df["model"].values:
logging.error(f"Target model '{self.target_model}' not found")
return None
target_idx = df.index[df["model"] == self.target_model][0]
# Fit KNN using Euclidean metric
nn = NearestNeighbors(n_neighbors=min(self.k + 1, len(df)), metric="euclidean")
nn.fit(X_scaled)
# Query index
distances, indices = nn.kneighbors(X_scaled[target_idx].reshape(1, -1))
# Flatten results
distances = distances.ravel().tolist()
indices = indices.ravel().tolist()
# Build response payload
results = {
"target": {"model": self.target_model, "coords": X_scaled[target_idx].tolist()},
"neighbors": []
}
# Skip target itself (index 0) and capture neighbors
for idx, dist in zip(indices[1:], distances[1:]):
results["neighbors"].append({
"model": df.at[idx, "model"],
"distance": round(dist, 4),
"coords": X_scaled[idx].tolist()
})
return results
Step 2: Defining the FastAPI API View
Now, we map this comp-engine to a secure, high-performance FastAPI endpoint, accepting raw data payloads and returning standardized coordinates for visualization:
from fastapi import APIRouter, HTTPException, Query
from typing import Any
router = APIRouter()
@router.post("/api/v1/funnel/knn")
async def calculate_aida_knn(
data: list[dict[str, Any]],
target_model: str = Query(..., description="Target model name to search comps for"),
k: int = Query(4, description="Number of historical comps to return")
):
try:
# Instantiate calculator
engine = AidaKNN(data=data, target_model=target_model, k=k)
result = engine.calculate()
if result is None:
raise HTTPException(
status_code=400,
detail=f"Failed to compile comps. Verify target '{target_model}' is in dataset."
)
return result
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Funnels Math Best Practices
- Always Standardize: Never compute raw Euclidean distances on non-standardized features. Large scale metrics (like Awareness counts) will overwhelm smaller, high-converting stages (like Desire).
- Filter Nulls: Ensure your data cleansing pipeline drops rows with missing metrics before fitting the Scaler to avoid outputting
NaNdistance matrices. - Coordinate Returning: Return both scaled and raw coordinates to let frontend mapping charts (like Radar charts or PCA scatters) display the relative distance properly.
Implementing a standardization and KNN comp matching inside FastAPI lets your business intelligence platform instantly uncover closest historical comps with scientific precision.