Streaming the OSC diff: a Rust pipeline for affected-bbox extraction.

OpenStreetMap is a planet-sized graph. Geofabrik takes that graph and ships you minutely diffs — .osc.gz files describing every node, way and relation added, modified or deleted in the last 60 seconds. If you're running a routing engine on top of OSM, you have to ingest those diffs and rebuild whatever they touched.

The naive solution rebuilds entire continents. The right solution finds the smallest set of bounding boxes the diff actually changed, and only rebuilds those. That sounds simple. It's not, and I want to talk about why — and what a working Rust pipeline looks like.

What's in an OSC diff (and what isn't)

An OSC file is XML, gzipped. It looks like this:

<osmChange version="0.6" generator="osmium/1.16.0">
  <modify>
    <node id="123456" version="3" lat="-33.8688" lon="151.2093"/>
    <way id="78910" version="5">
      <nd ref="123456"/>
      <nd ref="111213"/>
      <tag k="highway" v="residential"/>
    </way>
  </modify>
  <delete>
    <node id="999" version="2"/>
  </delete>
</osmChange>

The interesting wrinkle: when a way is in the diff (modified or deleted), the XML gives you its node references — <nd ref="123456"/> — but not the coordinates of those nodes. To know where a way is, you have to look up its nodes' lat/lon somewhere.

Where? Not in the diff. The diff only includes nodes that changed. Nodes the way still references but didn't change are not in the diff. You have to look them up in your existing snapshot of the planet.

That's the central problem.

The PBF node-coord index

The "existing snapshot of the planet" is a PBF file — OSM's binary format, around 70 GB for the whole planet. Loading it into memory isn't happening. Loading it into Postgres works but is slower than we want.

The trick: you don't need any of the metadata. You need a map from node_id to (lat, lon). That's it. For a 7-billion-node planet, the naive layout (u64, f64, f64) is around 56 GB. Domain knowledge buys back most of that.

Two observations: node IDs in OSM are dense (most integers in [1, N] are used), and 1e-7 degree precision is fine for lat/lon (about 1 cm at the equator — well below routing-graph resolution). Encode lat/lon as fixed-point i32s. Index by ID-block so you can binary-search to the right block and then linear-scan inside. We bench-marked it at ~120 ns per lookup cold, ~20 ns hot.

// Each block covers 2^16 consecutive node IDs.
// Within a block, store id-delta + i32 lat + i32 lon, packed.
struct CoordRecord {
    id_offset: u16,   // delta from block base
    lat_q7:    i32,   // (lat * 1e7) as i32
    lon_q7:    i32,
}

For an OSC diff of ~50,000 changed nodes, that's a few milliseconds of lookup work. The full coord index for the planet fits in ~28 GB on disk, mmap'd. The OS handles the working set.

Streaming the OSC parser

Minutely OSC files are small — typically under 1 MB compressed. You could parse them into memory and be done. We don't, for two reasons. First, the pipeline runs continuously and not allocating a fresh AST every minute saves allocator pressure. Second, emergency diffs (after a vandalism revert or large import) can be hundreds of MB; we want to handle them without rewriting the pipeline.

So we stream. The osc-stream crate is a Stream<Item = OscEvent> over the gzipped XML. OscEvent is an enum: NodeAdded, NodeModified, NodeDeleted, WayAdded, WayModified, WayDeleted, and the relation equivalents. Each event carries just enough to identify the entity and its current coordinates if coordinates are in the diff.

For a way event without coordinates, we emit WayModified { id, node_refs } and defer the lookup to the next stage.

Affected-bbox: greedy haversine merge

Now we have a stream of events. We resolve coordinates: for node events the coordinates are in the diff; for way events we look up each node_ref in the coord index and assemble a bounding box.

So now we have a stream of (event, bbox) pairs. The question: what's the combined affected area?

The dumb answer is the union of all bboxes. For ~10,000 events scattered across the planet, that's a useless answer ("all of Australia"). The better answer is a small set of bboxes that together cover the events, with gaps where no events live — if a diff touches Sydney and Perth but nothing in between, the result should be [Sydney bbox, Perth bbox], not [Australia bbox].

The bbox-merge crate is a greedy clusterer with a haversine threshold:

fn merge(events: Vec<Bbox>, threshold_km: f64) -> Vec<Bbox> {
    let mut clusters: Vec<Bbox> = Vec::new();
    for ev in events {
        let nearest = clusters.iter().enumerate()
            .map(|(i, c)| (i, c.haversine_centroid_distance(&ev)))
            .min_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
        match nearest {
            Some((i, d)) if d < threshold_km => clusters[i] = clusters[i].union(&ev),
            _ => clusters.push(ev),
        }
    }
    clusters
}

The threshold is tuned per-tile-size. For our 50 km routing tiles, 30 km is the sweet spot — tight enough that distant events don't merge, loose enough that adjacent events do.

Why greedy? Because we tried the optimal algorithm (k-means with auto-k, then refinement) and the result was indistinguishable from greedy, while taking ~50 ms vs ~0.5 ms. For a pipeline that runs every minute, 0.5 ms wins. The clustering geometry is forgiving.

PMTiles v3 splicing

Once we have the affected bboxes, we know which routing tiles to rebuild. The tiles live in a PMTiles v3 archive — a single-file map tileset format that's roughly "a tile directory plus the tile data, both addressable."

The naive flow: rebuild the affected tiles, then rebuild the entire archive (write the directory + all tile data to a new file). For our planet-sized archive (~140 GB), that's a 20-minute write. We run minutely. So that flow is out.

The clever flow is to splice. PMTiles v3 has a leaf-directory layout that's amenable to in-place updates if you're careful. The directory entries for affected tiles get updated to point at new offsets; the new tile data gets appended to the end of the file; the old tile data becomes garbage (we vacuum it on a weekly batch job).

let archive = PmtilesArchive::open(&path)?;
let new_offset = archive.append_tiles(&new_tiles)?;
let mut dir = archive.directory().clone();
for (z, x, y, off, len) in new_dir_entries {
    dir.update(z, x, y, off, len);
}
archive.write_directory_atomically(&dir)?;

"Write directory atomically" is doing some work. The directory is a few MB; we keep two slots in the archive header so we can write to slot B while slot A is live, then flip the header pointer with a single pwrite. A tiny shadow-page CAS.

The whole splice runs in ~200 ms for a typical diff.

Performance numbers

A typical minutely diff, end-to-end on a c7g.large ARM instance:

OSC size: ~600 KB gzipped
Events: ~8,000
Node-coord lookups: ~25,000
Affected bboxes after merge: ~30
Tiles to rebuild: ~50
Pipeline time: ~1.2 s

We run with ~15 seconds of slack against the minutely cadence, so we can absorb occasional 5× diffs without falling behind.

Lessons

Domain knowledge wins. The 5–10× compression on the coord index came from knowing that OSM node IDs are dense and lat/lon needs at most 1 cm precision. A generic key-value store would have been five times slower and five times bigger.

Greedy beats optimal where cluster geometry is forgiving. Haversine-greedy was 100× faster than k-means and gave the same answer for our workload. Try the dumb thing first; only reach for clever clustering when the dumb thing measurably fails.

Single-file formats with in-place update are worth the complexity. Rewriting a 140 GB archive every minute is not viable. Splicing it is. PMTiles v3 was designed for this; we had to implement the splicer.

The pipeline runs in production today. The architectural shape — four crates, streaming events, lazy coord lookup, greedy haversine merge, in-place tile splice — is the part I'd reach for again on any geo problem with the same diff-driven cadence.