sequence_match

Aggregate function that checks whether a sequence of events matches a pattern. Uses a mini-regex syntax over condition references, executed by an NFA engine.

Signature

sequence_match(pattern VARCHAR, timestamp TIMESTAMP,
               cond1 BOOLEAN, cond2 BOOLEAN [, ...]) -> BOOLEAN

Parameters:

ParameterTypeDescription
patternVARCHARPattern string using the syntax described below
timestampTIMESTAMPEvent timestamp
cond1..condNBOOLEANEvent conditions (2 to 32)

Returns: BOOLEAN -- true if the event stream contains a subsequence matching the pattern, false otherwise.

Usage

-- Did the user view a product and then purchase?
SELECT user_id,
  sequence_match('(?1).*(?2)', event_time,
    event_type = 'view',
    event_type = 'purchase'
  ) as converted
FROM events
GROUP BY user_id;

Pattern Syntax

Patterns are composed of the following elements:

PatternDescription
(?N)Match an event where condition N (1-indexed) is true
.Match exactly one event (any conditions)
.*Match zero or more events (any conditions)
(?t>=N)Time constraint: at least N seconds since previous match
(?t<=N)Time constraint: at most N seconds since previous match
(?t>N)Time constraint: more than N seconds since previous match
(?t<N)Time constraint: less than N seconds since previous match
(?t==N)Time constraint: exactly N seconds since previous match
(?t!=N)Time constraint: not exactly N seconds since previous match

Pattern Examples

-- View then purchase (any events in between)
'(?1).*(?2)'

-- View then purchase with no intervening events
'(?1)(?2)'

-- View, exactly one event, then purchase
'(?1).(?2)'

-- View, then purchase within 1 hour
'(?1).*(?t<=3600)(?2)'

-- Three-step sequence with time constraints
'(?1).*(?t<=3600)(?2).*(?t<=7200)(?3)'

Behavior

  1. Events are sorted by timestamp.
  2. The pattern is compiled into an NFA (nondeterministic finite automaton).
  3. The NFA is executed against the event stream using lazy matching semantics for .* (prefer advancing the pattern over consuming additional events).
  4. Returns true if any accepting state is reached.

Time constraints are evaluated relative to the timestamp of the previously matched condition step. The time difference is computed in seconds, matching ClickHouse semantics.

Time-Constraint Semantics

(?t op N) mirrors ClickHouse (verified against its implementation): the constraint gates the next pattern step, and non-matching events in between are skipped while the gate can still hold — (?1)(?t<=10)(?2) matches (?2) at any event within 10 seconds of (?1). Trailing (?t<=N), (?t<N) and (?t>=0) match the empty remainder. N is interpreted in seconds with the elapsed time floored to whole seconds, so (?t==N) means "within [N, N+1) seconds" (the faithful generalization of ClickHouse's whole-second DateTime comparison to microsecond timestamps).

Determinism

Events sort by (timestamp, conditions) before matching, so results are deterministic regardless of thread count or physical row order.

Errors

A malformed pattern aborts the query with the parser's position-annotated message instead of silently returning NULL:

invalid sequence pattern '(?1)(?': pattern error at position 6: ...

A NULL pattern yields a NULL result (lenient), matching SQL aggregate conventions.

Adversarial patterns that exhaust the NFA exploration budget (scaled as 8 × events × steps) abort the query with a descriptive error rather than silently reporting no match. Consecutive .* wildcards are collapsed at parse time, so ordinary patterns never approach the budget.

Implementation

OperationComplexity
UpdateO(1) amortized (event append)
CombineO(m) where m = events in other state
FinalizeO(n * s) NFA execution, where n = events, s = pattern steps
SpaceO(n) -- all collected events

At benchmark scale, sequence_match processes 100 million events in 1.05 s (95 Melem/s).

See Also