Skip to main content

Webhook Fanout

The webhook fanout system receives inbound webhooks from third parties (e.g. Juspay payment events), then durably re-delivers each event to any number of DB-configurable downstream HTTP endpoints, with per-endpoint retries.

It lives in the admin-api module under com.elivaas.pms.webhook. The transport is SNS → SQS (decoupling + buffering) and delivery is orchestrated by a Temporal Cloud workflow, so retries, backoff and durability are handled by Temporal rather than hand-rolled.

How It Works

Third party ──POST /api/webhooks/{source}  (HTTP Basic Auth)──► WebhookInboundController
│ verify Basic Auth against webhook_inbound_credential (per source)
│ idempotency check on (source, external_event_id)
│ persist webhook_event (status RECEIVED) ──► publish to SNS ──► 200 OK

SNS topic ──► SQS queue ──► WebhookSqsConsumer (@SqsListener)
│ start Temporal workflow, workflowId = eventId (dedupes redeliveries)

WebhookFanoutWorkflow (Temporal Cloud, task queue "WEBHOOKS")
├─ prepare: resolve endpoints for (source, eventType); create PENDING webhook_delivery rows
├─ deliver (one activity per endpoint, concurrent, each with its own RetryPolicy)
│ POST payload + auth (NONE | BASIC | BEARER | HMAC) + custom headers
│ 2xx → SUCCESS · 4xx → non-retryable fail · 5xx/timeout → retry
└─ finishEvent: COMPLETED | PARTIAL | FAILED

The inbound endpoint always answers quickly: it returns 200 as soon as the event is safely persisted and published, so the sender does not retry. Downstream delivery happens asynchronously.

Routing

Each downstream webhook_endpoint is keyed by source plus an optional event type:

  • A row with a specific event_type receives only that event type.
  • A row with event_type = NULL is a catch-all that receives every event for the source.

An event fans out to the union of the matching active endpoints.

Retries & Dead-Lettering

Two independent layers:

LegMechanismExhaustion
Ingest (SNS → SQS → consumer)SQS redelivery on consumer errorAfter maxReceiveCount (5) the message moves to the webhook-fanout-dlq DLQ
Delivery (per endpoint)Temporal RetryOptions (exponential backoff, max_attempts per endpoint)The webhook_delivery row is left FAILED and is replayable

A 4xx from an endpoint is treated as a configuration/contract error and is not retried; 5xx, timeouts and connection errors are retried.

Idempotency

webhook_event has a partial unique index on (source, external_event_id). A duplicate inbound delivery (same source + external id) is detected on receipt and is not re-published. The Temporal workflow id is the event id, so a redelivered SQS message cannot start a second fanout for the same event.

Data Model

Created by migration 052-add-webhook-fanout-tables.sql:

TableGrainHolds
webhook_inbound_credentialper sourceBasic Auth username + bcrypt password hash used to verify inbound calls
webhook_endpointper downstream targetURL, method, auth config, custom headers, max_attempts, timeout_seconds, routing (source + nullable event_type)
webhook_eventper received eventraw payload + headers, status, external_event_id, workflow_id
webhook_deliveryper (event, endpoint)status, attempts, last response status/body, last error

Outbound Authentication

Each endpoint chooses how its delivery requests are authenticated:

auth_typeHeader added
NONE
BASICAuthorization: Basic … from auth_username + auth_secret
BEARERAuthorization: Bearer <auth_secret>
HMACX-Webhook-Signature: <base64(HMAC-SHA256(body, signing_secret))>

Custom static headers (custom_headers, a JSON object) are added for every auth type.

See Configuration & Operations for environment setup and the admin API.