Replaying massive data in a non-production environment using Pekko Streams and Kubernetes Pekko...
Briefly

Replaying massive data in a non-production environment using Pekko Streams and Kubernetes Pekko...
"A non-production environment is necessary for testing and integration purposes in any production setting. As our traffic system at DoubleVerify is designed to process billions of requests per day, creating such non-production environments is a significant challenge. An integral part of this type of environment is replaying real traffic from the production environment to a testing and integration environment. In this post, I will discuss the motivations, challenges, and solutions for replaying real traffic from a production to a non-production environment."
"A typical pattern to achieve this functionality is shown in the diagram below: In the above example, we can see that we sent a configurable percentage of the incoming event stream to a Kafka topic and replicated it via the "mirroring" tool. We used Mirror Maker, but you can use any mirroring tool you choose and even implement a consumer-producer-based tool of your own."
"As mentioned above, we needed to support both functional and non-functional tests. To support both use cases, there are two main challenges to creating this mechanism: Functional tests, such as business logic or data evaluation tests, require maintaining stability with the replayed traffic rate. For example, if a non-prod environment can process 800 RPS, the traffic replay mechanism should on"
Replaying real production traffic into non-production environments enables realistic functional and non-functional testing of complex, high-throughput systems. Production traffic is sampled and routed to a Kafka topic and mirrored to the test environment using a mirroring tool such as Mirror Maker or a custom consumer-producer solution. Functional testing requires stable, reduced replay rates to match non-production capacity while non-functional testing requires preserving original scale and burstiness for load evaluation. Implementations must address data sanitization, identity rewriting, side-effect mitigation, endpoint adaptation, and isolation of downstream systems. Rate controllers, masking pipelines, and mock services ensure safe and accurate replays.
Read at Medium
Unable to calculate read time
[
|
]