Polymorphic Crawler Design: Structuring Reusable Request Adapters for Diverse Vendor APIs

2023-05-25 16:51:25+00:00

When aggregating inventory data from 10+ different electronic distributors, you will notice that no two APIs are the same. One provider returns data in XML, another in nested JSON; one uses the field name partNumber while another uses manufacturer_part. Creating separate code paths for each distributor leads to code duplication and fragility.

The solution is to design a Polymorphic Crawler Engine using the Adapter Pattern to translate diverse API payloads into a single, unified internal schema.

1. Defining the Base Crawler Interface

We write an abstract base class in Python that defines the execution contract for all distributor crawlers:

from abc import ABC, abstractmethod

class BaseCrawler(ABC):
    @abstractmethod
    async def search_part(self, part_number: str) -> list:
        '''Query the supplier API and return standardized product records'''
        pass

2. Implementing Concrete Adapters

Each distributor has its own subclass that translates the standard query into a specific HTTP request and parses the raw response into a unified dictionary format containing: partNum, manufacturer, price, and inStock:

class MouserCrawler(BaseCrawler):
    def __init__(self, api_key):
        self.api_key = api_key

    async def search_part(self, part_number: str) -> list:
        # 1. Execute Mouser-specific HTTP request
        # 2. Parse Mouser-specific nesting
        return [{
            "partNum": record["ManufacturerPartNumber"],
            "manufacturer": record["Manufacturer"],
            "inStock": int(record["AvailabilityInStock"]),
            "prices": parse_mouser_prices(record["PriceBreaks"])
        }]

This design makes adding or modifying a supplier simple: you write a new subclass without touching the orchestration logic.