How We Built Our Edge Processing Pipeline

By Alex Rivera · March 28, 2026 · 12 min read

Why Edge Processing?
Architecture Overview
The 16-Stage Pipeline
Caching Strategy
Lessons Learned

When we set out to build ListenLayer's analytics infrastructure, we faced a fundamental choice: process events in the cloud (like most analytics tools) or process them at the edge. We chose the edge, and it changed everything.

Why Edge Processing?

Traditional analytics follows a simple path: browser sends event to server, server stores it, server processes it later. This works, but it introduces latency for real-time features and requires sending raw, unenriched data to the cloud.

The edge is where the browser meets the server. It's the perfect place to enrich, classify, and route events before they ever reach your data warehouse.

By processing at the edge, we can:

Enrich events with geographic, device, and bot detection data in under 50ms
Evaluate trigger rules and fire data actions before the response goes back to the browser
Apply consent decisions immediately, never sending unconsented data to the cloud
Classify pages and forms in real-time using cached AI classifications

Architecture Overview

Our edge layer runs on Cloudflare Workers — JavaScript functions deployed to 300+ data centers worldwide. Each event is processed by the nearest Worker, typically within 20km of the visitor.

Browser → Cloudflare Edge (50ms) → GCP Data Warehouse
           ├── Validate & consent check
           ├── Identity resolution
           ├── Page/form classification (KV cache)
           ├── Trigger evaluation
           ├── Data action matching
           └── Forward enriched event to GCP

The Worker has access to several storage layers:

KV (Key-Value): Account settings, form registry, page classifications, trigger rules
R2 (Object Storage): Source-of-truth settings files, updated by GCP
Durable Objects: Per-device trigger polling state, preview sessions
Cache API: Short-lived event processing cache

The 16-Stage Pipeline

Every event flows through a composable pipeline of 16 stages. Each stage reads from the pipeline context, does its work, and writes results back. Stages can be skipped, short-circuited, or run in different order for edge-generated events.

Stage 0:  Settings (load account config from 3-tier cache)
Stage 1:  Validate (schema check, drop malformed)
Stage 2:  Consent (resolve tracking mode, IP anonymization)
Stage 3:  Identity (device ID, person association)
Stage 3b: Network (geo enrichment, device detection)
Stage 3c: Engagement (multi-page/deep engagement tracking)
Stage 3d: PII Unlock (vault unlock on consent upgrade)
Stage 3e: Person (LLM-powered person detection)
Stage 4:  Company Detection (IP → company reveal)
Stage 5:  Forms (registry lookup, classification merge)
Stage 5b: Page Classification (AI classification, content extraction)
Stage 5c: Traffic Source (rule-based source classification)
Stage 6:  Ecommerce (item calculations, subscription tracking)
Stage 6b: Custom Variables (condition + table rules)
Stage 7:  Triggers + Data Actions (rule evaluation, delivery)
Stage 8:  Forward (GCP or Preview, response to SDK)

Caching Strategy

Performance at the edge depends on minimizing KV reads. Our caching strategy uses three tiers:

L1 — In-memory (per-isolate): 5-30 minute TTL depending on data type. Near-instant access. Validated via hash comparison on each request.
L2 — Cloudflare KV: Populated by the settings-sync worker. Sub-millisecond reads. The primary data source for most stages.
L3 — R2 (fallback): Gzipped JSON files. Only read when KV is empty (first request after deployment or settings change).

Hash-based invalidation ensures L1 caches are never stale for more than 30 seconds, while only costing one tiny KV read per request (the hash key) instead of the full data blob.

Lessons Learned

After processing billions of events through this pipeline, here are our key takeaways:

Composability over monoliths. Each stage is a pure function that reads context and writes results. This makes testing, debugging, and extending trivial.
Cache everything, invalidate smartly. The difference between 2ms and 20ms per event is whether you hit L1 or L2. Hash-based validation gives you both freshness and speed.
Design for edge-generated events. Not all events come from the browser. Company reveals, person detection, and PII unlocks are generated by the edge itself and need to flow through the same pipeline.
Preview is not optional. Being able to see every event, every stage, every decision in real-time saved us months of debugging time.

The edge processing approach isn't for everyone — it requires careful architecture and a willingness to work within the constraints of the edge runtime. But for analytics, where every millisecond matters and every event needs enrichment, it's the right call.

Comments (3)

DevOps_Dave March 29, 2026

Great writeup! How do you handle cold starts on the Workers? We've seen 50-100ms cold start times on complex Workers.

AnalyticsNerd March 29, 2026

The hash-based L1 invalidation is clever. We use a similar pattern with Redis but the single-KV-read approach is more elegant for edge.

PrivacyFirst March 30, 2026

Love that consent is Stage 2 — before any enrichment. Most analytics tools bolt on consent as an afterthought. This is the right architecture.