When scraping data from major distributors like DigiKey, Arrow, and Avnet, static API keys are rarely sufficient. High-volume enterprise endpoints require robust authentication protocols, including dynamic Client Credentials or OAuth2 authorization flows with refreshing access tokens.
Managing these tokens across multiple async workers requires thread-safe caches and proactive refresh loops to prevent request failures during ingestion.
1. Managing the DigiKey OAuth2 Flow
DigiKey requires an authorization flow where a Refresh Token is exchanged for a temporary Access Token (valid for 24 hours). The crawler service must intercept requests, check if the cached access token is expired, and request a refresh if needed:
# Thread-safe OAuth2 Token Refresh in Python
import requests
import time
class DigiKeyTokenManager:
def __init__(self, client_id, client_secret, refresh_token):
self.client_id = client_id
self.client_secret = client_secret
self.refresh_token = refresh_token
self.access_token = None
self.expires_at = 0
def get_valid_token(self):
if self.access_token and time.time() < self.expires_at - 300: # 5-min buffer
return self.access_token
# Token is expired; exchange refresh token for new access token
url = "https://api.digikey.com/v1/oauth2/token"
payload = {
"client_id": self.client_id,
"client_secret": self.client_secret,
"refresh_token": self.refresh_token,
"grant_type": "refresh_token"
}
res = requests.post(url, data=payload)
data = res.json()
self.access_token = data["access_token"]
self.refresh_token = data["refresh_token"] # Save updated refresh token
self.expires_at = time.time() + int(data["expires_in"])
return self.access_token
2. Secure Key Storage
We store these credentials in cloud parameter vaults like AWS Systems Manager (SSM) Parameter Store or Secrets Manager, rather than committing them in configuration files, keeping our keys secure and rotation-friendly.