How we utterly rearchitected Mussel, our storage engine for derived information, and classes discovered from the migration from Mussel V1 to V2.
By Shravan Gaonkar, Chandramouli Rangarajan, Yanhan Zhang
How we utterly rearchitected Mussel, our storage engine for derived information, and classes discovered from the migration from Mussel V1 to V2.
Airbnb’s core key-value retailer, internally referred to as Mussel, bridges offline and on-line workloads, offering extremely scalable bulk load capabilities mixed with single-digit millisecond reads.
Since first writing about Mussel in a 2022 weblog publish, we have now utterly deprecated the storage backend of the unique system (what we now name Mussel v1) and have changed it with a NewSQL backend which we’re referring to as Mussel v2. Mussel v2 has been operating efficiently in manufacturing for a yr, and we wished to share why we undertook this rearchitecture, what the challenges had been, and what advantages we obtained from it.
Why rearchitect
Mussel v1 reliably supported Airbnb for years, however new necessities — real-time fraud checks, prompt personalization, dynamic pricing, and large information — demand a platform that mixes real-time streaming with bulk ingestion, all whereas being straightforward to handle.
Key Challenges with v1
Mussel v2 solves a variety of points with v1, delivering a scalable, cloud-native key-value retailer with predictable efficiency and minimal operational overhead.
- Operational complexity: Scaling or changing nodes required multi-step Chef scripts on EC2; v2 makes use of Kubernetes manifests and automatic rollouts, decreasing hours of handbook work to minutes.
- Capability & hotspots: Static hash partitioning typically overloaded nodes, resulting in latency spikes. V2’s dynamic vary sharding and presplitting maintain reads quick (p99
- Consistency flexibility: v1 provided restricted consistency management. v2 lets groups select between instant or eventual consistency primarily based on their SLA wants.
- Price & Transparency: Useful resource utilization in v1 was opaque. v2 provides namespace tenancy, quota enforcement, and dashboards, offering price visibility and management.
New structure
Mussel v2 is an entire re-architecture addressing v1’s operational and scalability challenges. It’s designed to be automated, maintainable, and scalable, whereas making certain characteristic parity and a straightforward migration for 100+ current consumer circumstances.
Dispatcher
In Mussel v2, the Dispatcher is a stateless, horizontally-scalable Kubernetes service that replaces the tightly coupled, protocol-specific design of v1. It interprets consumer API calls into backend queries/mutations, helps dual-write and shadow-read modes for migration, manages retries and price limits, and integrates with Airbnb’s service mesh for safety and repair discovery.
Reads are simplified: Every dataname maps to a logical desk, enabling optimized level lookups, vary/prefix queries, and off reads from native replicas to scale back latency. Dynamic throttling and prioritization preserve efficiency beneath altering visitors.
Writes are persevered in Kafka for sturdiness first, with the Replayer and Write Dispatcher making use of them as a way to the backend. This event-driven mannequin absorbs bursts, ensures consistency, and removes v1’s operational overhead. Kafka additionally underpins upgrades, bootstrapping, and migrations till CDC and snapshotting mature.
The structure fits derived information and replay-heavy use circumstances immediately, with a long-term purpose of shifting ingestion and replication absolutely to the distributed backend database to carry down latency and simplify operations.
Bulk load
Bulk load stays important for shifting massive datasets from offline warehouses into Mussel for low-latency queries. v2 preserves v1 semantics, supporting each “merge” (add to current tables) and “exchange” (swap datasets) semantics.
To take care of a well-recognized interface, v2 retains the present Airflow-based onboarding and transforms warehouse information right into a standardized format, importing to S3 for ingestion. Airflow is an open-source platform for authoring, scheduling, and monitoring information pipelines. Created at Airbnb, it lets customers outline workflows in code as directed acyclic graphs (DAGs), enabling fast iteration and straightforward orchestration of duties for information engineers and scientists worldwide.
A stateless controller orchestrates jobs, whereas a distributed, stateful employee fleet (Kubernetes StatefulSets) performs parallel ingestion, loading data from S3 into tables. Optimizations — like deduplication for exchange jobs, delta merges, and insert-on-duplicate-key-ignore — guarantee excessive throughput and environment friendly writes at Airbnb scale.
TTL
Automated information expiration (TTL) will help help information governance objectives and storage effectivity. In v1, expiration relied on the storage engine’s compaction cycle, which struggled at scale.
Mussel v2 introduces a topology-aware expiration service that shards information namespaces into range-based subtasks processed concurrently by a number of staff. Expired data are scanned and deleted in parallel, minimizing sweep time for big datasets. Subtasks are scheduled to restrict impression on dwell queries, and write-heavy tables use max-version enforcement with focused deletes to take care of efficiency and information hygiene.
These enhancements present the identical retention performance as v1 however with far better effectivity, transparency, and scalability, assembly Airbnb’s trendy information platform calls for and enabling future use circumstances.
The migration course of
Problem
Mussel shops huge quantities of information and serves 1000’s of tables throughout a big selection of Airbnb companies, sustaining mission-critical learn and write visitors at excessive scale. Given the criticality of Mussel to Airbnb’s on-line visitors, our migration purpose was simple however difficult: Transfer all information and visitors from Mussel v1 to v2 with zero information loss and no impression on availability to our prospects.
Course of
We adopted a blue/inexperienced migration technique, however with notable complexities. Mussel v1 didn’t present table-level snapshots or CDC streams, that are commonplace in lots of datastores. To bridge this hole, we developed a customized migration pipeline able to bootstrapping tables to v2, chosen by utilization patterns and threat profiles. As soon as bootstrapped, twin writes had been enabled on a per-table foundation to maintain v2 in sync because the migration progressed.
The migration itself adopted a number of distinct levels:
- Blue Zone: All visitors initially flowed to v1 (“Blue”). This supplied a steady baseline as we migrated information behind the scenes.
- Shadowing (Inexperienced): As soon as tables had been bootstrapped, v2 (“Inexperienced”) started shadowing v1 — dealing with reads/writes in parallel, however solely v1 responded. This allowed us to verify v2’s correctness and efficiency with out threat.
- Reverse: After constructing confidence, v2 took over energetic visitors whereas v1 remained on standby. We constructed computerized circuit breakers and fallback logic: If v2 confirmed elevated error charges or lagged behind v1, we may immediately return visitors to v1 or revert to shadowing.
- Cutover: When v2 handed all checks, we accomplished the cutover on a dataname-by-dataname foundation, with Kafka serving as a strong middleman for write reliability all through.
To additional de-risk the method, migration was carried out one desk at a time. Each step was reversible and may very well be fine-tuned per desk or group of tables primarily based on their threat profile. This granular, staged strategy allowed for fast iteration, secure rollbacks, and steady progress with out impacting the enterprise.
Migration pipeline
As described in our earlier weblog publish, the v1 structure makes use of Kafka as a replication log — information is first written to Kafka, then consumed by the v1 backend. In the course of the information migration to v2, we leveraged the identical Kafka stream to take care of eventual consistency between v1 and v2.
Emigrate any given desk from v1 to v2, we constructed a customized pipeline consisting of the next steps:
- Supply information sampling: We obtain backup information from v1, extract the related tables, and pattern the info to know its distribution.
- Create pre-split desk on v2: Primarily based on the sampling outcomes, we create a corresponding v2 desk with a pre-defined shard structure to reduce information reshuffling throughout migration.
- Bootstrap: That is probably the most time-consuming step, taking hours and even days relying on desk measurement. To bootstrap effectively, we use Kubernetes StatefulSets to persist native state and periodically checkpoint progress.
- Checksum verification: We confirm that each one information from the v1 backup has been accurately ingested into v2.
- Catch-up: We apply any lagging messages that amassed in Kafka throughout the bootstrap part.
- Twin writes: At this stage, each v1 and v2 devour from the identical Kafka matter. We guarantee eventual consistency between the 2, with replication lag sometimes inside tens of milliseconds.
As soon as information migration is full and we enter twin write mode, we are able to start the learn visitors migration part. Throughout this part, our dispatcher could be dynamically configured to serve learn requests for particular tables from v1, whereas sending shadow requests to v2 for consistency checks. We then regularly shift to serving reads from v2, accompanied by reverse shadow requests to v1 for consistency checks, which additionally permits fast fallback to v1 responses if v2 turns into unstable. Finally, we absolutely transition to serving all learn visitors from v2.
Classes discovered
A number of key insights emerged from this migration:
- Consistency complexity: Migrating from an ultimately constant (v1) to a strongly constant (v2) backend launched new challenges, significantly round write conflicts. Addressing these required options like write deduplication, hotkey blocking, and lazy write restore — typically buying and selling off storage price or learn efficiency.
- Presplitting is vital: As we shifted from hash-based (v1) to range-based partitioning (v2), inserting massive consecutive information may trigger hotspots and disrupt our v2 backend. To forestall this, we wanted to precisely pattern the v1 information and presplit it into a number of shards primarily based on v2’s topology, making certain balanced ingestion visitors throughout backend nodes throughout information migration.
- Question mannequin changes: v2 doesn’t push down vary filters as successfully, requiring us to implement client-side pagination for prefix and vary queries.
- Freshness vs. price: Completely different use circumstances required totally different tradeoffs. Some prioritized information freshness and used main replicas for the newest reads, whereas others leveraged secondary replicas to steadiness staleness with price and efficiency.
- Kafka’s position: Kafka’s confirmed steady p99 millisecond latency made it a useful a part of our migration course of.
- Constructing in flexibility: Buyer retries and routine bulk jobs supplied a security internet for the uncommon inconsistencies, and our migration design allowed for per-table stage assignments and prompt reversibility — key for managing threat at scale.
Because of this, we migrated greater than a petabyte of information throughout 1000’s of tables with zero downtime or information loss, because of a blue/inexperienced rollout, dual-write pipeline, and automatic fallbacks — so the product groups may maintain transport options whereas the engine beneath them advanced.
Conclusion and subsequent steps
What units Mussel v2 aside is the best way it fuses capabilities which are often confined to separate, specialised techniques. In our deployment of Mussel V2, we observe that this technique can concurrently
- ingest tens of terabytes in bulk information add,
- maintain 100 okay+ streaming writes per second in the identical cluster, and
- maintain p99 reads beneath 25 ms
— all whereas giving callers a easy dial to toggle stale reads on a per-namespace foundation. By pairing a NewSQL backend with a Kubernetes-native management aircraft, Mussel v2 delivers the elasticity of object storage, the responsiveness of a low-latency cache, and the operability of recent service meshes — rolled into one platform. Engineers now not must sew collectively a cache, a queue, and a datastore to hit their SLAs; Mussel offers these ensures out of the field, letting groups concentrate on product innovation as an alternative of information plumbing.
Wanting forward, we’ll be sharing deeper insights into how we’re evolving high quality of service (QoS) administration inside Mussel, now orchestrated cleanly from the Dispatcher layer. We’ll additionally describe our journey in optimizing bulk loading at scale — unlocking new efficiency and reliability wins for complicated information pipelines. In case you’re keen about constructing large-scale distributed techniques and wish to assist form the way forward for information infrastructure at Airbnb, check out our Careers web page — we’re all the time on the lookout for gifted engineers to affix us on this mission.
References
How we utterly rearchitected Mussel, our storage engine for derived information, and classes discovered from the migration from Mussel V1 to V2.
By Shravan Gaonkar, Chandramouli Rangarajan, Yanhan Zhang
How we utterly rearchitected Mussel, our storage engine for derived information, and classes discovered from the migration from Mussel V1 to V2.
Airbnb’s core key-value retailer, internally referred to as Mussel, bridges offline and on-line workloads, offering extremely scalable bulk load capabilities mixed with single-digit millisecond reads.
Since first writing about Mussel in a 2022 weblog publish, we have now utterly deprecated the storage backend of the unique system (what we now name Mussel v1) and have changed it with a NewSQL backend which we’re referring to as Mussel v2. Mussel v2 has been operating efficiently in manufacturing for a yr, and we wished to share why we undertook this rearchitecture, what the challenges had been, and what advantages we obtained from it.
Why rearchitect
Mussel v1 reliably supported Airbnb for years, however new necessities — real-time fraud checks, prompt personalization, dynamic pricing, and large information — demand a platform that mixes real-time streaming with bulk ingestion, all whereas being straightforward to handle.
Key Challenges with v1
Mussel v2 solves a variety of points with v1, delivering a scalable, cloud-native key-value retailer with predictable efficiency and minimal operational overhead.
- Operational complexity: Scaling or changing nodes required multi-step Chef scripts on EC2; v2 makes use of Kubernetes manifests and automatic rollouts, decreasing hours of handbook work to minutes.
- Capability & hotspots: Static hash partitioning typically overloaded nodes, resulting in latency spikes. V2’s dynamic vary sharding and presplitting maintain reads quick (p99
- Consistency flexibility: v1 provided restricted consistency management. v2 lets groups select between instant or eventual consistency primarily based on their SLA wants.
- Price & Transparency: Useful resource utilization in v1 was opaque. v2 provides namespace tenancy, quota enforcement, and dashboards, offering price visibility and management.
New structure
Mussel v2 is an entire re-architecture addressing v1’s operational and scalability challenges. It’s designed to be automated, maintainable, and scalable, whereas making certain characteristic parity and a straightforward migration for 100+ current consumer circumstances.
Dispatcher
In Mussel v2, the Dispatcher is a stateless, horizontally-scalable Kubernetes service that replaces the tightly coupled, protocol-specific design of v1. It interprets consumer API calls into backend queries/mutations, helps dual-write and shadow-read modes for migration, manages retries and price limits, and integrates with Airbnb’s service mesh for safety and repair discovery.
Reads are simplified: Every dataname maps to a logical desk, enabling optimized level lookups, vary/prefix queries, and off reads from native replicas to scale back latency. Dynamic throttling and prioritization preserve efficiency beneath altering visitors.
Writes are persevered in Kafka for sturdiness first, with the Replayer and Write Dispatcher making use of them as a way to the backend. This event-driven mannequin absorbs bursts, ensures consistency, and removes v1’s operational overhead. Kafka additionally underpins upgrades, bootstrapping, and migrations till CDC and snapshotting mature.
The structure fits derived information and replay-heavy use circumstances immediately, with a long-term purpose of shifting ingestion and replication absolutely to the distributed backend database to carry down latency and simplify operations.
Bulk load
Bulk load stays important for shifting massive datasets from offline warehouses into Mussel for low-latency queries. v2 preserves v1 semantics, supporting each “merge” (add to current tables) and “exchange” (swap datasets) semantics.
To take care of a well-recognized interface, v2 retains the present Airflow-based onboarding and transforms warehouse information right into a standardized format, importing to S3 for ingestion. Airflow is an open-source platform for authoring, scheduling, and monitoring information pipelines. Created at Airbnb, it lets customers outline workflows in code as directed acyclic graphs (DAGs), enabling fast iteration and straightforward orchestration of duties for information engineers and scientists worldwide.
A stateless controller orchestrates jobs, whereas a distributed, stateful employee fleet (Kubernetes StatefulSets) performs parallel ingestion, loading data from S3 into tables. Optimizations — like deduplication for exchange jobs, delta merges, and insert-on-duplicate-key-ignore — guarantee excessive throughput and environment friendly writes at Airbnb scale.
TTL
Automated information expiration (TTL) will help help information governance objectives and storage effectivity. In v1, expiration relied on the storage engine’s compaction cycle, which struggled at scale.
Mussel v2 introduces a topology-aware expiration service that shards information namespaces into range-based subtasks processed concurrently by a number of staff. Expired data are scanned and deleted in parallel, minimizing sweep time for big datasets. Subtasks are scheduled to restrict impression on dwell queries, and write-heavy tables use max-version enforcement with focused deletes to take care of efficiency and information hygiene.
These enhancements present the identical retention performance as v1 however with far better effectivity, transparency, and scalability, assembly Airbnb’s trendy information platform calls for and enabling future use circumstances.
The migration course of
Problem
Mussel shops huge quantities of information and serves 1000’s of tables throughout a big selection of Airbnb companies, sustaining mission-critical learn and write visitors at excessive scale. Given the criticality of Mussel to Airbnb’s on-line visitors, our migration purpose was simple however difficult: Transfer all information and visitors from Mussel v1 to v2 with zero information loss and no impression on availability to our prospects.
Course of
We adopted a blue/inexperienced migration technique, however with notable complexities. Mussel v1 didn’t present table-level snapshots or CDC streams, that are commonplace in lots of datastores. To bridge this hole, we developed a customized migration pipeline able to bootstrapping tables to v2, chosen by utilization patterns and threat profiles. As soon as bootstrapped, twin writes had been enabled on a per-table foundation to maintain v2 in sync because the migration progressed.
The migration itself adopted a number of distinct levels:
- Blue Zone: All visitors initially flowed to v1 (“Blue”). This supplied a steady baseline as we migrated information behind the scenes.
- Shadowing (Inexperienced): As soon as tables had been bootstrapped, v2 (“Inexperienced”) started shadowing v1 — dealing with reads/writes in parallel, however solely v1 responded. This allowed us to verify v2’s correctness and efficiency with out threat.
- Reverse: After constructing confidence, v2 took over energetic visitors whereas v1 remained on standby. We constructed computerized circuit breakers and fallback logic: If v2 confirmed elevated error charges or lagged behind v1, we may immediately return visitors to v1 or revert to shadowing.
- Cutover: When v2 handed all checks, we accomplished the cutover on a dataname-by-dataname foundation, with Kafka serving as a strong middleman for write reliability all through.
To additional de-risk the method, migration was carried out one desk at a time. Each step was reversible and may very well be fine-tuned per desk or group of tables primarily based on their threat profile. This granular, staged strategy allowed for fast iteration, secure rollbacks, and steady progress with out impacting the enterprise.
Migration pipeline
As described in our earlier weblog publish, the v1 structure makes use of Kafka as a replication log — information is first written to Kafka, then consumed by the v1 backend. In the course of the information migration to v2, we leveraged the identical Kafka stream to take care of eventual consistency between v1 and v2.
Emigrate any given desk from v1 to v2, we constructed a customized pipeline consisting of the next steps:
- Supply information sampling: We obtain backup information from v1, extract the related tables, and pattern the info to know its distribution.
- Create pre-split desk on v2: Primarily based on the sampling outcomes, we create a corresponding v2 desk with a pre-defined shard structure to reduce information reshuffling throughout migration.
- Bootstrap: That is probably the most time-consuming step, taking hours and even days relying on desk measurement. To bootstrap effectively, we use Kubernetes StatefulSets to persist native state and periodically checkpoint progress.
- Checksum verification: We confirm that each one information from the v1 backup has been accurately ingested into v2.
- Catch-up: We apply any lagging messages that amassed in Kafka throughout the bootstrap part.
- Twin writes: At this stage, each v1 and v2 devour from the identical Kafka matter. We guarantee eventual consistency between the 2, with replication lag sometimes inside tens of milliseconds.
As soon as information migration is full and we enter twin write mode, we are able to start the learn visitors migration part. Throughout this part, our dispatcher could be dynamically configured to serve learn requests for particular tables from v1, whereas sending shadow requests to v2 for consistency checks. We then regularly shift to serving reads from v2, accompanied by reverse shadow requests to v1 for consistency checks, which additionally permits fast fallback to v1 responses if v2 turns into unstable. Finally, we absolutely transition to serving all learn visitors from v2.
Classes discovered
A number of key insights emerged from this migration:
- Consistency complexity: Migrating from an ultimately constant (v1) to a strongly constant (v2) backend launched new challenges, significantly round write conflicts. Addressing these required options like write deduplication, hotkey blocking, and lazy write restore — typically buying and selling off storage price or learn efficiency.
- Presplitting is vital: As we shifted from hash-based (v1) to range-based partitioning (v2), inserting massive consecutive information may trigger hotspots and disrupt our v2 backend. To forestall this, we wanted to precisely pattern the v1 information and presplit it into a number of shards primarily based on v2’s topology, making certain balanced ingestion visitors throughout backend nodes throughout information migration.
- Question mannequin changes: v2 doesn’t push down vary filters as successfully, requiring us to implement client-side pagination for prefix and vary queries.
- Freshness vs. price: Completely different use circumstances required totally different tradeoffs. Some prioritized information freshness and used main replicas for the newest reads, whereas others leveraged secondary replicas to steadiness staleness with price and efficiency.
- Kafka’s position: Kafka’s confirmed steady p99 millisecond latency made it a useful a part of our migration course of.
- Constructing in flexibility: Buyer retries and routine bulk jobs supplied a security internet for the uncommon inconsistencies, and our migration design allowed for per-table stage assignments and prompt reversibility — key for managing threat at scale.
Because of this, we migrated greater than a petabyte of information throughout 1000’s of tables with zero downtime or information loss, because of a blue/inexperienced rollout, dual-write pipeline, and automatic fallbacks — so the product groups may maintain transport options whereas the engine beneath them advanced.
Conclusion and subsequent steps
What units Mussel v2 aside is the best way it fuses capabilities which are often confined to separate, specialised techniques. In our deployment of Mussel V2, we observe that this technique can concurrently
- ingest tens of terabytes in bulk information add,
- maintain 100 okay+ streaming writes per second in the identical cluster, and
- maintain p99 reads beneath 25 ms
— all whereas giving callers a easy dial to toggle stale reads on a per-namespace foundation. By pairing a NewSQL backend with a Kubernetes-native management aircraft, Mussel v2 delivers the elasticity of object storage, the responsiveness of a low-latency cache, and the operability of recent service meshes — rolled into one platform. Engineers now not must sew collectively a cache, a queue, and a datastore to hit their SLAs; Mussel offers these ensures out of the field, letting groups concentrate on product innovation as an alternative of information plumbing.
Wanting forward, we’ll be sharing deeper insights into how we’re evolving high quality of service (QoS) administration inside Mussel, now orchestrated cleanly from the Dispatcher layer. We’ll additionally describe our journey in optimizing bulk loading at scale — unlocking new efficiency and reliability wins for complicated information pipelines. In case you’re keen about constructing large-scale distributed techniques and wish to assist form the way forward for information infrastructure at Airbnb, check out our Careers web page — we’re all the time on the lookout for gifted engineers to affix us on this mission.









