Skip to content

webhook: configurable max response body size to avoid oversized log messages #974

Description

@alexluong

Problem

A single destination response can be large enough that the resulting LogEntry exceeds the log queue's per-message size limit. When that happens the publish fails, the attempt never lands in the logstore, and the event retries forever (no prior attempt found in logstore, will retry).

The webhook driver reads the response body with io.ReadAll and stores it verbatim — there is no cap:

  • internal/destregistry/providers/destwebhook/httphelper.go:172ParseHTTPResponseio.ReadAll(resp.Body)delivery.Response["body"]
  • internal/destregistry/registry.go:205,261attempt.ResponseData = deliveryData.Response
  • LogEntry { Event, Attempt, Destination }json.Marshal → published to the log MQ

SQS limits are 256 KB per message and 1 MB per batch. #845 capped batch aggregation (MaxBatchByteSize in internal/mqs/queue_awssqs.go), but a single oversized LogEntry still exceeds the per-message limit and is rejected — so #845 does not cover this case. It is also an unbounded-memory read regardless of queue backend.

Reported in #663 and PR #845 discussion.

Proposal

Add a configurable max response body size for the webhook driver.

  • Default: no cap — opt-in, fully backward-compatible.
  • When set and the response body exceeds the limit, replace the body with a placeholder instead of storing it, e.g. Response larger than <limit> not stored, and flag it (response_truncated: true plus the size we observed). Do not store partial content.
  • Enforce at read time in ParseHTTPResponse via io.LimitReader, so we never buffer the full body. Read up to limit + 1; if the extra byte is present the response is over the limit. (Note: because we stop reading early we can only report > limit, not the exact original size — acceptable trade-off for bounding memory.)
  • Apply the placeholder to both the logstore record and the customer-visible attempt response so the two stay consistent.

Config

New knob, env-configurable, unset = no cap. Naming TBD, e.g. DESTINATIONS_WEBHOOK_MAX_RESPONSE_BODY_BYTES.

Scope

This exposes a knob for the webhook response body. We are deliberately not solving the general "any LogEntry can exceed the queue limit" problem here — operators set the limit to fit their backend.

Cases we're knowingly leaving to the operator for now:

  • Other destination types — every provider's response funnels through registry.go:205,261, so a central cap would cover all of them. But only the webhook driver produces large responses (others store message IDs / ack metadata), so we're not adding a central backstop.
  • Large event payloads — the event itself also counts toward the per-message limit. That's customer input accepted at ingest; capping it is a separate decision, not addressed here.

Open questions

  • Final config name and units (bytes vs KB).
  • Placeholder wording and exact extra fields on response_data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions