Load Testing with Impulse at Airbnb | by Chenhao Yang | The Airbnb Tech Weblog

Autonomous coding brokers: A Codex instance

4 June 2025

Listening, Studying, and Serving to at Scale: How Machine Studying Transforms Airbnb’s Voice Help Expertise | by Yuanpei Cao | The Airbnb Tech Weblog | Might, 2025

29 May 2025

Complete Load Testing with Load Generator, Dependency Mocker, Visitors Collector, and Extra

Authors: Chenhao Yang, Haoyue Wang, Xiaoya Wei, Zay Guan, Yaolin Chen and Fei Yuan

System-level load testing is essential for reliability and effectivity. It identifies bottlenecks, evaluates capability for peak visitors, establishes efficiency baselines, and detects errors. At an organization of Airbnb’s dimension and complexity, we’ve discovered that load testing must be sturdy, versatile, and decentralized. This requires the fitting set of instruments to allow engineering groups to do self-service load exams that combine seamlessly with CI.

Impulse is one among our inner load-testing-as-a-service frameworks. It offers instruments that may generate artificial hundreds, mock dependencies, and acquire visitors knowledge from manufacturing environments. On this weblog submit, we’ll share how Impulse is architected to reduce handbook effort, seamlessly combine with our observability stack, and empower groups to proactively handle potential points.

Impulse is a complete load testing framework that enables service house owners to conduct context-aware load exams, mock dependencies, and acquire visitors knowledge to make sure the system’s efficiency underneath numerous circumstances. It contains the next parts:

Load generator to generate context-aware requests on the fly, for testing completely different eventualities with artificial or collected visitors.
Dependency mocker to mock the downstream responses with latency, in order that the load testing on the service underneath check (SUT) doesn’t have to contain sure dependent providers. That is particularly essential when the dependencies are vendor providers that don’t assist load testing, or if the workforce desires to regression load check their service throughout day-to-day deployment with out affecting downstreams.
Visitors collector to gather each the upstream and downstream visitors from the manufacturing setting, after which apply the ensuing knowledge to the check setting.
Testing API generator to wrap asynchronous workflows into synchronous API requires load testing.

Determine 1: The Impulse framework and its 4 fundamental parts

Every of those 4 instruments are impartial, permitting service house owners the pliability to pick out a number of parts for his or her load testing wants.

Load generator

Context conscious

When load testing, requests made to the SUT typically require some data from the earlier response or should be despatched in a selected order. For instance, if an replace API wants to supply an entity_id to replace, we should make sure the entity already exists within the testing setting context.

Our load generator software permits customers to write down arbitrary testing logic in Java or Kotlin and launch containers to run these exams at scale in opposition to the SUT. Why write code as a substitute of DSL/configuration logic?

Flexibility: Programming languages are extra expressive than DSL and may higher assist advanced contextual eventualities.
Reusability: The identical testing code can be utilized in different exams, e.g., integration exams.
Developer proficiency: Low/no studying curve to onboard, don’t have to discover ways to write testing logic.
Developer expertise: IDE assist, testing, debugging, and so forth.

Right here is an instance of artificial context-aware check case:

class HelloWorldLoadGenerator : LoadGenerator {
override droop enjoyable run() {
val createdEntity = sutApiClient.create(CreateRequest(identify="foo", ...)).knowledge// request with id from earlier response (context)
val updateResponse = sutApiClient.replace(UpdateRequest(id=createdEntity.id, identify="bar"))
// ... different operations
// clear up
sutApiClient.delete(DeleteRequest(id=createdEntity.id))
}
}

Decentralized

The load generator is decentralized and containerized, which suggests every time a load check is triggered, a set of latest containers might be created to run the check. This design has a number of advantages:

Isolation: Load testing runs between completely different providers are remoted from one another, eliminating any interference.
Scalability: The variety of containers will be scaled up or down based on the visitors necessities.
Price effectivity: The containers are short-lived, as they solely exist through the load testing run.

What’s extra, as our providers are cloud based mostly, a delicate level is that the Impulse framework will evenly distribute the employees amongst all our knowledge facilities, and the load might be emitted evenly from all the employees. Impulse’s load generator ensures the general set off per second (TPS) is as configured. Based mostly on this, we are able to higher leverage the locality settings in load balancers, which may higher mimic the actual visitors distribution in manufacturing.

Execution

The load generator is designed to be executed within the CI/CD pipeline, which suggests we are able to set off load testing routinely. Builders can configure the testing spec in a number of phases, e.g., a heat up section, a gradual state section, a peak section, and so forth. Every section will be configured with:

Check instances to run
TPS (set off per second) of every check case
Check length

Dependency mocker

Impulse is a decentralized framework the place every service has its personal dependency mocker. This will get rid of interference between providers and cut back communication prices. Every dependency mocker is an out-of-process service, which suggests the SUT behaves simply because it does in manufacturing. We run the mockers in separate situations to keep away from any influence on the efficiency of the SUT. The mock servers are all brief lived — they solely begin earlier than exams run and shut down afterwards to save lots of prices and upkeep effort. The response latency and exceptions are configurable and the variety of mocker situations will be adjusted on demand to assist massive quantities of visitors.

Different noteworthy options:

You may selectively stub among the dependencies. At the moment, stubbing is supported for HTTP JSON, Airbnb Thrift, and Airbnb GraphQL dependencies.
The dependency mockers assist use instances past load testing. As an example, integration exams typically depend on different providers or third-party API calls, which can not assure a steady testing setting or may solely assist very best eventualities. Dependency mockers can handle this by providing predefined responses or exceptions to totally check these flows.

Impulse helps two choices for producing mock responses:

Artificial response: The response is generated by consumer logic, as in integration testing; the distinction is that the response comes from a distant (out-of-process) server with simulated latency.
– Just like the load generator, the logic is written in Java/Kotlin code and accommodates request matching and response technology.
– Latency will be simulated utilizing p95/p99 metrics.
Replay response: The response is replayed from the manufacturing downstream recording, supported by the visitors collector element.

Right here is an instance of an artificial response with latency in Kotlin:

downstreamsMocking.each(
thriftRequest().having { it.message == "whats up" }
).returns { request ->
ThriftDownstream.Response.thriftEncoded(
HttpStatus.OK,
FooResponse.builder.reply("${request.message} world").construct()
)
}.with {
delay = latencyFromP95(p95=500.miliseconds, min=200.miliseconds, max=2000.miliseconds)
}

Visitors collector

The visitors collector element is designed to seize each upstream and downstream visitors, together with the relationships between them. This method permits Impulse to precisely replay manufacturing visitors throughout load testing, avoiding inconsistencies in downstream knowledge or conduct. By replicating downstream responses — together with production-like latency and errors — through the dependency mocker, the system ensures high-fidelity load testing. In consequence, providers within the testing setting behave identically to these in manufacturing, enabling extra life like and dependable efficiency evaluations.

Testing API generator

We rely closely on event-driven, asynchronous workflows which might be important to our enterprise operations. These embody processing occasions from a message queue (MQ) and executing delayed jobs. Many of the MQ occasions/jobs are emitted from synchronous flows (e.g., API calls), so theoretically they are often lined by API load testing. Nevertheless, the actual world is extra advanced. These asynchronous flows typically contain lengthy chains of occasion and job emissions originating from numerous sources, making it troublesome to duplicate and check them precisely utilizing solely API-based strategies.

To handle this, the testing API generator element creates HTTP APIs through the CI stage based on the occasion or job schema. These APIs act as wrappers across the underlying asynchronous flows and are registered solely within the testing setting. This setup allows load testing instruments — comparable to load mills — to ship visitors to those artificial APIs, permitting asynchronous flows to be exercised as in the event that they had been synchronous. In consequence, it’s doable to carry out focused, life like load testing on asynchronous logic that may in any other case be arduous to simulate.

Determine 5: Testing API generator for async flows

The aim of the testing API generator is to assist builders establish efficiency bottlenecks and potential points of their async move implementations and underneath excessive visitors circumstances. It does this by enabling direct load testing of async flows with out involving middleware parts like MQs. The rationale is that builders sometimes intention to guage the conduct of their very own logic, not the middleware, which is often already well-tested. By bypassing these parts, this method simplifies the load testing course of and empowers builders to independently handle and execute their very own exams.

Integration with different testing frameworks

Airbnb emphasizes product high quality, using versatile testing frameworks that cowl integration and API exams throughout improvement, staging, and manufacturing environments, and combine easily into CI/CD pipelines. The modular design of Impulse facilitates its integration with these frameworks, providing systematic service testing.

Determine 6: How Impulse interfaces with different inner testing frameworks

On this weblog submit, we shared how Impulse and its 4 core parts assist builders carry out self-service load testing at Airbnb. As of this writing, Impulse has been applied in a number of buyer assist backend providers and is presently underneath evaluation with completely different groups throughout the corporate who’re planning to leverage Impulse to conduct load testing.

We’ve acquired a variety of good suggestions within the course of. For instance: “Impulse helps us to establish and handle potential points in our service. Throughout testing, it detected an ApiClientThreadToolExhaustionException brought on by thread pool stress. Moreover, it alerted us about occasional timeout errors in consumer API calls throughout service deployments. Impulse helped us establish excessive reminiscence utilization in the primary service container, enabling us to fine-tune the reminiscence allocation and optimize our service’s useful resource utilization. Extremely advocate using Impulse as an integral a part of the event and testing processes.”

Because of Jeremy Werner, Yashar Mehdad, Raj Rajagopal, Claire Cheng, Tim L., Wei Ji, Jay Wu, Brian Wallace for assist on the Impulse undertaking.

Does this sort of work curiosity you? Try our open roles right here.