Algorithmic Funnels: Implementing AIDA KNN with scikit-learn in FastAPI

In marketing intelligence, comparing different campaigns or products across standard AIDA metrics (Awareness, Interest, Desire, Action) is essential. However, finding historical "comparable comps" is hard because different campaigns operate at vastly different scales (e.g. comparing a massive global launch with a niche release). To compare them accurately, we must normalize the data and find nearest neighbors algorithmically.

In this post, we'll design an automated comp-engine in FastAPI using scikit-learn's StandardScaler and NearestNeighbors to find the closest historical matches in AIDA space.

The Comp-Engine Architecture

To build a robust mathematical search engine, our FastAPI service intercepts incoming marketing metrics and executes a clean scikit-learn pipeline:

Data Ingestion: Accepts a target campaign name and a list of historical campaigns.
Feature Standardization: Uses StandardScaler to center and scale features (Awareness, Interest, Desire) to a standard normal distribution. This is critical because if one metric has a scale of 0-1,000,000 and another is 0-10, the larger scale will dominate the distance metric completely.
Euclidean Proximity Search: Fits a NearestNeighbors index using Euclidean distance to find the K closest neighbors.

Step 1: Standardizing Features and Calculating KNN

We'll create a dedicated Python helper class, AidaKNN, to handle standardizing dataframes and querying the index:

import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
import logging

class AidaKNN:
    def __init__(self, data: list[dict], target_model: str, k: int = 4):
        self.data = data
        self.target_model = target_model
        self.k = k
        self.features = ["awareness", "interest", "desire"]

    def calculate(self) -> dict | None:
        df = pd.DataFrame(self.data)
        
        # Verify required columns exist
        if "model" not in df.columns or not all(f in df.columns for f in self.features):
            logging.error("Missing required columns in input dataset")
            return None

        # Standardize features (Mean=0, StdDev=1)
        X = df[self.features].values
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)

        # Get target index
        if self.target_model not in df["model"].values:
            logging.error(f"Target model '{self.target_model}' not found")
            return None
        target_idx = df.index[df["model"] == self.target_model][0]

        # Fit KNN using Euclidean metric
        nn = NearestNeighbors(n_neighbors=min(self.k + 1, len(df)), metric="euclidean")
        nn.fit(X_scaled)

        # Query index
        distances, indices = nn.kneighbors(X_scaled[target_idx].reshape(1, -1))
        
        # Flatten results
        distances = distances.ravel().tolist()
        indices = indices.ravel().tolist()

        # Build response payload
        results = {
            "target": {"model": self.target_model, "coords": X_scaled[target_idx].tolist()},
            "neighbors": []
        }
        
        # Skip target itself (index 0) and capture neighbors
        for idx, dist in zip(indices[1:], distances[1:]):
            results["neighbors"].append({
                "model": df.at[idx, "model"],
                "distance": round(dist, 4),
                "coords": X_scaled[idx].tolist()
            })
            
        return results

Step 2: Defining the FastAPI API View

Now, we map this comp-engine to a secure, high-performance FastAPI endpoint, accepting raw data payloads and returning standardized coordinates for visualization:

from fastapi import APIRouter, HTTPException, Query
from typing import Any

router = APIRouter()

@router.post("/api/v1/funnel/knn")
async def calculate_aida_knn(
    data: list[dict[str, Any]],
    target_model: str = Query(..., description="Target model name to search comps for"),
    k: int = Query(4, description="Number of historical comps to return")
):
    try:
        # Instantiate calculator
        engine = AidaKNN(data=data, target_model=target_model, k=k)
        result = engine.calculate()
        
        if result is None:
            raise HTTPException(
                status_code=400,
                detail=f"Failed to compile comps. Verify target '{target_model}' is in dataset."
            )
            
        return result
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Funnels Math Best Practices

Always Standardize: Never compute raw Euclidean distances on non-standardized features. Large scale metrics (like Awareness counts) will overwhelm smaller, high-converting stages (like Desire).
Filter Nulls: Ensure your data cleansing pipeline drops rows with missing metrics before fitting the Scaler to avoid outputting NaN distance matrices.
Coordinate Returning: Return both scaled and raw coordinates to let frontend mapping charts (like Radar charts or PCA scatters) display the relative distance properly.

Implementing a standardization and KNN comp matching inside FastAPI lets your business intelligence platform instantly uncover closest historical comps with scientific precision.