Logpipe: Building a Distributed Logging Stack from Scratch with Vector and ClickHouse

⚠ TL;DR

I built Logpipe - a small Go monorepo that mimics a real distributed logging stack on one laptop. A CRUD service emits application logs through slog, NGINX writes HTTP access logs, a Vector agent tails both, a Vector aggregator parses and batches them, ClickHouse stores them, and a tiny CLI reads them back. This post is the build log: why each piece is there, why the aggregator is the whole point, the two failure modes I deliberately reproduced ("Too many parts" and a buffer flush on restart), and where this stops being a useful model for a real stack. The repo is on GitHub.

The Zerodha post that started this

I read Logging at Zerodha and could not stop thinking about one number. They were storing 30 billion log lines on 2.5 TB of disk, on four r5a.2xlarge instances, for about $1.14 an hour. The previous setup, a more conventional Elasticsearch stack, was 13 TB on 3x m5a.8xlarge and cost more than twice as much. That is not a small win. That is a different shape of system.

What got me wasn't the cost. It was the architecture. Files on disk as JSON. A small agent tails them. A central aggregator batches them. ClickHouse stores them. A UI on top for queries. Five pieces, all boring, doing one thing each. No queue, no fan-out broker, no schema registry. The whole post reads like someone removed every piece they could and the thing got faster.

I had not built anything like that before. So I built Logpipe.

Logpipe is a small Go monorepo that mimics the same shape. A CRUD service emits application logs through slog. An NGINX proxy in front of it produces HTTP access logs. A Vector agent tails both files. A Vector aggregator parses, routes, batches, and ships them into two ClickHouse tables. A small CLI reads them back. It runs on one laptop with docker compose up. It is not a product, and it is not meant to handle 30 billion lines a day. It is meant to make every load-bearing decision in that architecture concrete enough that I could not handwave any of them.

This post is the build log: why each piece exists, the two failure modes I deliberately reproduced, and where this stops being a useful model for what a real stack costs. The repo is at github.com/vakharwalad23/logpipe.

The rules I set before writing any code

The temptation with a project like this is to grow it into a half-product. Add a UI, add auth, add a SaaS-style tenant abstraction. None of that teaches you anything about logging. So I wrote the rules down first:

It runs on one machine with one command. docker compose up brings everything online. No Kubernetes, no cloud provider, no provisioning step. The whole point of a learning model is that it has to be reproducible.
Every piece plays a real role. No fakes for the components I actually wanted to understand. NGINX has to be NGINX, not a Go middleware pretending to be NGINX. Vector has to be Vector in both agent and aggregator modes, not a single hop. ClickHouse has to be a real MergeTree, not a SQLite stand-in.
Two log streams, not one. A real service has application logs (what your code is doing) and HTTP access logs (what the proxy is seeing). They have different schemas and different sources. Modelling only one stream lets you skip the routing problem, and the routing problem is half the aggregator's job.
No shortcuts on the parts that matter. Batching, durability, and safe queries had to be done the right way. Everything else (the in-memory CRUD store, the basic-auth ClickHouse credentials, the Makefile target list) could be as lazy as I wanted.

That last rule is the one that actually shaped the system. Everything below falls out of it.

The system at a glance

Six processes, one shared volume, two ClickHouse tables.

Two things are worth flagging right now, because the rest of the post turns on them.

First, the agent never talks to ClickHouse directly. The arrows go agent -> aggregator -> ClickHouse, not agent -> ClickHouse. That split exists for a reason I will get to in the batching section.

Second, every log line is one JSON object with a log_type field at the top level (app or http). The aggregator routes on that single field. That is the entire routing protocol. No file naming convention, no source filter, no per-stream pipeline. One tagged stream.

Why ClickHouse

ClickHouse is a column store built for OLAP - analytics, aggregations, large scans over append-mostly tables. Most logging questions are OLAP questions in disguise. "How many 5xxs in the last hour, by service?" is a GROUP BY over a filtered range. "What did this user do between 14:00 and 14:05?" is a range scan on a time-ordered table.

Zerodha cite a 20x disk reduction vs Elasticsearch on the same data. Logpipe does not run anywhere near long enough to validate that number, and that is fine - I am not auditing Zerodha's claim, I am inheriting the architectural choice. The shape of the data (immutable, append-only, time-ordered, mostly read in ranges) matches what MergeTree is good at, and the match is the whole reason for the decision.

The relevant MergeTree rule, which the rest of this post depends on, is this: every INSERT produces an immutable "part". A background process merges small parts into bigger ones over time. Merging is what keeps reads fast. If your writers create parts faster than the merger can collapse them, you accumulate active parts in a partition, and ClickHouse has a hard limit.

That hard limit has a name and a number: parts_to_throw_insert, default 3000. Cross it and ClickHouse rejects new INSERTs with error code 252 ("Too many parts"). This isn't a corner case. It is the single most common way people break ClickHouse.

So the question shaping the rest of the stack is: who is responsible for keeping batches large?

Why Vector, and why two of them

Vector is a Rust binary that ships logs and metrics. It is one process that runs in two modes:

Agent mode - tails files on a host, ships events to somewhere else.
Aggregator mode - accepts events from agents, parses them, routes them, batches them, and writes them to a sink (ClickHouse, in our case).

You could run one Vector. You probably should not. The split exists because the agent runs everywhere (on every host that has logs) and the aggregator is the one place that gets to decide how big a batch is. If every agent talked directly to ClickHouse, every agent would create parts. You would lose control of the part rate by design.

In logpipe the agent is dead simple: tail every *.log file under /var/log/logpipe, ship to the aggregator.

● ● ●YAML

sources:
  app_files:
    type: file
    include:
      - /var/log/logpipe/*.log
    read_from: beginning

sinks:
  to_aggregator:
    type: vector
    inputs: [app_files]
    address: vector-aggregator:6000

That is the whole agent config. No parsing, no routing, no schema. It does the cheapest thing it can.

The aggregator does the work.

● ● ●YAML

transforms:
  parse:
    type: remap
    inputs: [from_agent]
    source: |
      . = parse_json!(.message)

  drop_health:
    type: filter
    inputs: [parse]
    condition: '!(.log_type == "http" && .path == "/healthz")'

  route:
    type: route
    inputs: [drop_health]
    route:
      app:  '.log_type == "app"'
      http: '.log_type == "http"'

Read it top to bottom. Parse the raw line as JSON. Drop the /healthz noise. Split the stream into app and http by the log_type field. Two more remap transforms (shape_app and shape_http) reshape each stream to match its target table's column names. Then they go to two ClickHouse sinks, one for each table.

The point of the split is not separation of concerns. It is the part rate. The aggregator is the only writer into ClickHouse, so whatever batching policy I set on it is the global batching policy, full stop.

The whole point: batching and "Too many parts"

This is the section the rest of the post exists to support.

Here is what happens, in order, when you insert a row into a MergeTree table:

ClickHouse writes the row to a new immutable directory on disk. That is a "part".
A background merge thread looks at the partition and tries to combine small parts into bigger ones.
Reads scan the active parts (the ones not yet merged into something bigger).

You can already see the failure mode. If you insert one row at a time, you get one tiny part per row. Merges run on a schedule and have to do real work; they cannot keep up if the part-creation rate outruns them. Active parts pile up. Once the count crosses parts_to_throw_insert (default 3000), the next insert is rejected with:

DB::Exception: Too many parts (3000). Merges are processing
significantly slower than inserts. (TOO_MANY_PARTS)

Error code 252. The fix is not raising the limit. The fix is not inserting one row at a time.

That is the aggregator's whole reason for existing in this stack. Vector's ClickHouse sink batches by batch.timeout_secs (and by max event count). I set both sinks to flush every 5 seconds:

● ● ●YAML

sinks:
  app_to_clickhouse:
    type: clickhouse
    inputs: [shape_app]
    endpoint: http://clickhouse:8123
    database: logs
    table: app_logs
    auth: { strategy: basic, user: logstore, password: logstore }
    skip_unknown_fields: true
    date_time_best_effort: true
    batch: { timeout_secs: 5 }
    buffer:
      type: disk
      max_size: 1073741824
      when_full: block

To confirm the design actually held, I ran two tests.

Good path. The loadgen tool can write log lines straight into the tailed file (the "synthetic" mode), bypassing NGINX and the application. I shoved a million rows through. ClickHouse reported about six active parts at the end, with system.parts healthy and the partition well under the limit. The aggregator was flushing well-sized batches; merges had no trouble keeping up.

Bad path. I deliberately broke the setup. Disabled the aggregator's batching, paused merges with SYSTEM STOP MERGES, and let the inserts go one row per request. Within minutes, the same kind of synthetic load surfaced the real error:

DB::Exception: Too many parts (3000). (TOO_MANY_PARTS)

The interesting thing is what Vector did next. It did not drop the failed events. It retried the failed inserts according to the sink's retry policy. Once I re-enabled merges and turned batching back on, the queue drained and the table caught up. The pipeline survived the failure cleanly because the aggregator had two things on its side: a retry policy that backs off, and a buffer that holds events while the back end is unhappy.

The second thing on Vector's side, the buffer, is the next section.

Durability: the disk buffer that stops you losing logs

Vector's default buffer is in memory. If the aggregator restarts (a config change, a crash, a docker compose restart), every event currently in the buffer is gone.

I learned that the hard way. During an earlier iteration, mid-experiment, I restarted the aggregator and lost about 29,000 rows that had not yet been flushed to ClickHouse. They were just gone. There was no error and no log line saying "buffer cleared on shutdown" - the events simply did not exist anywhere by the time the new process started.

The fix is two lines per sink:

● ● ●YAML

buffer:
  type: disk
  max_size: 1073741824
  when_full: block

Plus a named Docker volume mounted at the aggregator's data_dir, so the buffer state survives container replacement:

● ● ●YAML

vector-aggregator:
  image: timberio/vector:0.56.0-alpine
  volumes:
    - ./deploy/vector/aggregator.yaml:/etc/vector/vector.yaml:ro
    - vector_aggregator_data:/var/lib/vector

That changes the failure model in two ways.

A restart no longer drops events. Pending events sit on disk in the buffer directory. When the new process starts, it picks them up and finishes the flush. Vector handles the persistence; I do not have to think about acknowledgements.

A slow back end no longer drops events either. With when_full: block, when the buffer fills up, Vector applies backpressure to the source instead of discarding new events. The agent slows down. The TCP socket between agent and aggregator stalls. The agent's own buffer fills, and the same backpressure walks all the way back to the file reader. Nothing is silently dropped.

The trade-off is honest: with a disk buffer, your data path includes a disk write at the aggregator, not just at ClickHouse. For a real-volume deployment that disk needs to be sized for the worst hour of the worst day; logpipe's 1 GiB cap is fine for a laptop and nowhere near enough for production. The point is that "buffer fills up" is a backpressure problem, not a data-loss problem. That is the right shape.

There is a more subtle property worth flagging. The two ClickHouse sinks each have their own buffer. If ClickHouse hiccups on one table, only the affected sink stalls. The agent does not care which table its events end up in - the split happens inside the aggregator - so a temporary issue with http_logs does not stop app_logs from being written.

Why NGINX writes the HTTP logs, not Go

The CRUD service is small. It would be easy to log every incoming HTTP request from inside the handler. That is also exactly wrong.

In any non-trivial deployment there is a proxy in front of your service - NGINX, Envoy, an ALB, something. That proxy is the source of truth for what was actually served: the bytes that went over the wire, the duration the client experienced, the status code that hit the user. The application sees a subset of that, because it only sees the requests it accepted. It does not see the requests that got 502'd between the LB and the app. It does not see the 408 Request Timeout that the proxy issued when the client never finished sending its body. It does not see TLS handshake failures.

So logpipe puts NGINX in front of crud-api and configures it to emit one JSON object per access log line:

● ● ●NGINX

log_format json escape=json
  '{ "log_type":"http",'
   '"timestamp":"$time_iso8601",'
   '"host":"$host",'
   '"method":"$request_method",'
   '"path":"$uri",'
   '"status":$status,'
   '"duration_ms":$request_time,'
   '"bytes":$bytes_sent,'
   '"client_ip":"$remote_addr",'
   '"user_agent":"$http_user_agent" }';

access_log /var/log/logpipe/access.log json;

That gives two independent JSON streams arriving in the shared volume.

app.log - JSON from crud-api via slog.
access.log - JSON from NGINX.

Both formats carry a log_type field at the top level. From the agent's perspective they are interchangeable: JSON, one line per event, tailed from a file. The aggregator treats them as the same input type and pulls them apart with a single routing transform.

The lesson here is structural, not specific to NGINX. One tagged stream beats two untagged streams. If every event you produce, anywhere, carries a log_type (or whatever you want to call it), every downstream component can route on a single field instead of guessing from the producer's identity. Cheap, explicit, and works the same way for an agent on a host as for a service inside a pod.

One small detail: NGINX reports $request_time in seconds (a float), and the http_logs table stores duration_ms as a UInt32. The aggregator does the conversion in the shape_http transform with to_int(round(to_float!(.duration_ms) * 1000)). The kind of two-line shape fix that you want at the edge of the pipeline, not in every consumer.

slog for application logs

The application logs use Go's standard library log/slog with the JSON handler. No logging framework, no init dance, no plugins. The whole logger is built in about thirty lines, most of which are housekeeping.

● ● ●GO

opts := &slog.HandlerOptions{
    Level:       slog.LevelInfo,
    ReplaceAttr: renameForSchema,
}
logger := slog.New(slog.NewJSONHandler(f, opts))

host, _ := os.Hostname()
if host == "" {
    host = "unknown"
}
logger = logger.With(
    slog.String("log_type", "app"),
    slog.String("service", "crud-api"),
    slog.String("host", host),
)

The ReplaceAttr callback makes the JSON match what ClickHouse wants. slog defaults to time and msg for the timestamp and message keys. The app_logs table uses timestamp and message. Rather than push the rename into the aggregator (where it would be a VRL transform on every event), I do it once at the producer:

● ● ●GO

func renameForSchema(groups []string, a slog.Attr) slog.Attr {
    if len(groups) == 0 {
        switch a.Key {
        case slog.TimeKey:
            a.Key = "timestamp"
            a.Value = slog.TimeValue(a.Value.Time().UTC())
        case slog.MessageKey:
            a.Key = "message"
        }
    }
    return a
}

The UTC normalisation is the same idea. Timestamp handling is the kind of thing you want to fix in one place, at the source, instead of trusting every consumer to remember to convert.

The .With call at the bottom bakes three constant fields into every line the logger emits: log_type=app, the service name, and the host. That is all the routing metadata the aggregator needs. Any additional context (task_id, title, etc.) gets added at the call site:

● ● ●GO

logger.Info("created task",
    slog.String("task_id", id),
    slog.String("title", title),
)

The aggregator folds anything that does not map to a named column into the Map(String, String) field on the app_logs row, so adding new keys at the producer does not require a schema migration.

Schema design

Two tables, both in the logs database, both partitioned by month with a 30-day TTL.

● ● ●SQL

CREATE TABLE logs.app_logs (
    timestamp DateTime64(3),
    level     LowCardinality(String),
    service   LowCardinality(String),
    host      LowCardinality(String),
    message   String,
    fields    Map(String, String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (service, level, timestamp)
TTL toDateTime(timestamp) + INTERVAL 30 DAY;

● ● ●SQL

CREATE TABLE logs.http_logs (
    timestamp   DateTime64(3),
    host        LowCardinality(String),
    method      LowCardinality(String),
    path        String,
    status      UInt16,
    duration_ms UInt32,
    bytes       UInt32,
    client_ip   String,
    user_agent  String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (status, timestamp)
TTL toDateTime(timestamp) + INTERVAL 30 DAY;

A few specifics worth calling out.

LowCardinality(String) does most of the work that an index would do. Log columns are dominated by a handful of values: log level is one of five strings, HTTP method is one of nine, service name is one of a few dozen. LowCardinality stores them as a dictionary internally, so the actual stored payload is an integer in a small alphabet. It compresses very well, scans very fast, and you do not need a separate index for it. This is the kind of optimisation that does not appear in your code; it appears in the column definitions.

The ORDER BY is the primary index. ClickHouse uses the ORDER BY clause as its sparse primary key. The first column should be the one you filter on most often. For app logs that is service, then level, then timestamp. For HTTP logs it is status, then timestamp. The CLI's read patterns ("show me 4xxs in the last hour", "tail WARN logs from a service") fall out of those choices for free.

Monthly partitions, with a TTL. Daily partitions are common at very high volume because dropping an entire partition is cheap (one filesystem operation), whereas a row-level TTL has to run a merge. Monthly is fine for logpipe's scale, and the TTL handles deletion incrementally. At Zerodha-scale you would switch to daily for high-volume tables; for a learning model it is overkill.

Map(String, String) is the escape hatch. Anything the producer attaches that does not have a dedicated column (a task_id, an internal trace ID) gets folded into fields. You give up a typed column for flexibility. That is the right trade-off for a schema you do not fully own at the producer side.

Safe queries by construction

The query CLI accepts user-supplied filters like:

● ● ●BASH

go run ./cmd/query-api search --table http \
  --filter "status=404 path=/api/things" --since 1h

The temptation is to take those key=value pairs and stitch them into SQL with fmt.Sprintf. That is the easy way to get a SQL injection in a personal project. So the CLI never does that.

The defence is two-layered. First, the column name on the left of every = is checked against a whitelist per table:

● ● ●GO

var columns = map[string]map[string]bool{
    "app": {
        "level":   true,
        "service": true,
        "host":    true,
        "message": true,
    },
    "http": {
        "method":    true,
        "path":      true,
        "status":    true,
        "client_ip": true,
        "host":      true,
    },
}

An unknown filter key, or a key that is valid for the other table, is rejected before any query runs. The error message is explicit ("unknown filter key %q for %s logs") so you cannot accidentally write a filter that silently does nothing.

Second, the value on the right is always passed as a bound parameter. The CLI builds the WHERE clause as col = ? placeholders and an args []any slice in lockstep:

● ● ●GO

func where(stream string, filters []Filter) (string, []any, error) {
    allowed := columns[stream]
    var conds []string
    var args []any
    for _, f := range filters {
        if !allowed[f.Col] {
            return "", nil, fmt.Errorf(
                "unknown filter key %q for %s logs", f.Col, stream)
        }
        conds = append(conds, f.Col+" = ?")
        args = append(args, value(f.Col, f.Val))
    }
    return strings.Join(conds, " AND "), args, nil
}

The ClickHouse driver substitutes the values server-side. Nothing the user types ever ends up as raw SQL text.

The only piece of dynamic SQL that gets interpolated is the table name, and that comes from a tables map keyed by the --table flag:

● ● ●GO

var tables = map[string]string{
    "app":  "logs.app_logs",
    "http": "logs.http_logs",
}

A flag value of anything other than app or http errors out before any query is built. The user never gets to pick the table string directly.

This is one of those places where the lazy answer (string concatenation) and the safe answer (parameterised) cost almost the same code. The difference is two ? placeholders and a slice. There is no excuse for getting it wrong.

The build order, and why it mattered

I built this in a deliberately weird order. The order is the lesson.

ClickHouse first. Bring up the database and create both target tables. Now there is a real destination, with the real schema, before anything else exists.
Prove the transport before the producer exists. Vector ships with a built-in demo_logs source that emits synthetic events. Wire demo_logs to a throwaway table; confirm that agent -> aggregator -> ClickHouse works and that batches flush. Zero application code. This was the riskiest part, so it had to be the first thing that worked end to end.
Build the producer. Add crud-api with the slog handler. Two streams (app, http via NGINX), each tagged with log_type. Do not change the pipeline.
Wire the real logs. Swap the agent's demo_logs source for a file source pointed at the shared volume. The aggregator's parsing and routing are still untouched. Because only the source changed, any breakage is parser- or schema-level, not transport-level.
Load it. Add loadgen with two modes: real HTTP requests through NGINX, and synthetic writes straight to the tailed file. The first proves the whole pipe under realistic conditions; the second pushes raw volume past the application layer.
Read it back. Build the query CLI. Three modes: search for filtered reads, count for time-bucketed aggregations, tail for the most recent lines.

The principle: when you build a pipeline, prove the middle layer first, with synthetic data. Producers and consumers are easy. The transport, the batching, the buffering - those are the hard parts. If you start at the producer and work forward, by the time you discover a batching problem you have a stack of code that all needs to be true at once. If you start in the middle, every later piece you add is a one-thing-changed delta against something that already works.

You can see this pattern everywhere in well-built systems. Mock services in front of a real service mesh. Synthetic transactions in payment pipelines. Test fixtures that produce the right shape of data before the real producers exist. Same idea: prove the part you are most worried about, with the cheapest possible inputs, and add the rest on top of something that already works.

Scale: what this can tell you, and what it cannot

The honest framing for a project like this is: logpipe is a model of an architecture, not a benchmark of one.

What it teaches:

Why the agent/aggregator split exists, and what specifically breaks when you skip it.
What a Vector pipeline actually looks like (sources, transforms, routes, sinks) and how YAML expresses the topology.
How MergeTree's part lifecycle interacts with the producer's batching policy.
How to use LowCardinality and ORDER BY to make a log table fast without thinking about indexes.
Why a disk buffer is the difference between "we lost some events on restart" and "we did not".
That a tagged JSON stream plus one routing transform is enough to handle multiple log types cleanly.

What it does not teach:

Cluster behaviour. Logpipe runs a single ClickHouse node. Replication, sharding, distributed tables, the gymnastics of running a multi-node cluster - none of that is here.
Real production cardinality. A million synthetic rows on one laptop is several orders of magnitude smaller than the data shapes that surface interesting problems. You will not hit memory pressure on aggregations, you will not see what path looks like with a million distinct values, and you will not see what merges cost when the partition is genuinely big.
Multi-tenant isolation. Logpipe has one service, two tables. A real platform has hundreds of services, opinions about which teams can see which data, and an RBAC story to enforce it.
Cost. Zerodha quote $1.14/hr for a 4x r5a.2xlarge ClickHouse cluster holding 30 billion lines on 2.5 TB. That number is the one to keep in your head while sizing your own setup. Logpipe gives you the architecture; the infrastructure number is yours to estimate.

Zerodha's numbers are the bridge between the model and reality. The architecture is the same. The infrastructure is not.

Planned experiments

The stack is in place. The next things I want to actually measure against this small dataset are:

Skip indexes vs full scans. ClickHouse supports skip indexes (minmax, set, bloom_filter) as an optional data structure to skip granules during reads. For the read patterns the query CLI cares about, do they actually help, or does the ORDER BY already cover the common cases?
Daily vs monthly partitioning. At Zerodha-scale daily partitioning is standard, because dropping an entire partition is faster than per-row TTL cleanup. At logpipe-scale the trade-off looks different. What is the disk overhead and merge churn cost of daily over monthly when the data volume is small?
TTL expiry in practice. Let the 30-day TTL fire and confirm old data is actually removed without any explicit action. Watch the merge behaviour around expiry windows.
Compression ratio. Measure the on-disk size against the raw line size to get a sense of the LZ4 plus LowCardinality story end to end.

I will add the results to the README and update this post when the experiments run.

What I would do differently

A few things I would change if I were starting over.

Pick a log format more deliberately. Zerodha settled on logfmt for application logs - readable in a terminal, machine-parseable, and a closer fit to how engineers actually look at logs during incident response. JSON is fine, but it is also unreadable raw, which means you end up piping it through jq for every casual lookup. logfmt does not have that problem. For a learning model I picked JSON because the parser is one line of VRL; for a real stack I would think harder.

No metrics on the aggregator itself. Vector exposes Prometheus metrics, and the aggregator should be scraped. I never wired that up. Without it, you have no visibility into queue depth, sink lag, retry rate, or buffer fill ratio - and those are exactly the signals that tell you whether the pipeline is healthy. In production this is non-negotiable.

No end-to-end latency budget. I never set a target for "how long from event-produced to event-queryable", which means I have no way to know if the system is on plan. A 5-second batch flush plus a few hundred ms for the file tail plus the ClickHouse insert puts logpipe in the 5-10 second range. Fine for offline log review. Terrible for live-debugging an incident. The fix would be a faster flush on a hot path; the lesson is that you should set the number before you build.

Lessons

The aggregator is the architecture. If you take away one thing, take that. Whichever process owns "decide how big a batch is and when to flush it" is the load-bearing piece. Everything else is wiring.
Prove the riskiest layer first, with synthetic inputs. Vector's demo_logs source let me wire up ClickHouse and confirm batches flushed before any real producer existed.
A disk buffer is not optional. In-memory buffers turn restarts into data loss. Two YAML lines and a volume mount remove the failure mode.
LowCardinality matters more than indexes for log shapes. Log columns repeat. Lean into that, and most of your "do I need an index for this" questions vanish.
Parameterised queries are the only safe queries. The lazy version (string concatenation) is the same number of lines as the right one (placeholders plus an args slice). There is no excuse.
One tagged stream beats two untagged streams. A log_type field at the source lets every downstream component route on one rule.

Conclusion

The thing I kept coming back to while building this is that none of the pieces are clever. ClickHouse is just a column store. Vector is just a Rust binary that copies bytes between sources and sinks. NGINX writes a log file. slog writes a log file. The clever part is the topology - the decision to put the batching authority in exactly one place, and to make every other process boring and replaceable around it.

That is what Zerodha's post taught me, and that is what logpipe is meant to demonstrate end to end. The repo is small enough to read in an evening, every config file fits on a screen, and the failure modes are real enough that you can break the stack on purpose and watch it recover. If you are sitting on a logging problem and trying to decide whether the answer is "more Elasticsearch" or "different architecture", I think this is the cheapest possible way to find out which one you actually want.

◆ KEY TAKEAWAY

Distributed logging is not about clever storage. It is about who owns the batches and who survives a restart without losing data. Get the aggregator right and the rest of the stack falls into place around it.

References

Logpipe on GitHub - the full source.
Logging at Zerodha - the post that started this.
ClickHouse MergeTree documentation - parts, merges, and the "too many parts" behaviour.
Vector file source - tailing logs from disk.
Vector ClickHouse sink - batching and buffer settings.
Vector Remap Language (VRL) - parsing, routing, and shaping events.