TLDR;
In this article we will have a look at what happens when a reactive application stops working the way we want to, what type of issues we can encounter and what are the main causes for that. Furthermore, we will explore different ways of adding traceability to the Reactive Pipeline, see what is happening in it and then move on to techniques we can deploy to tackle and solve these issues. In the second part we will discuss what happens when we manually deal with threads and schedulers and what is the right course of action for that kind of issues.
Anatomy of the Reactive Pipeline
Before diving into what kind of issues can happen in the pipeline and what we can do to solve them, it’s important to understand what are the main components of a Reactive Pipeline and how subscribers interact with it.
Once we understand the normal working flow of a reactive application, we can explore its different modes of failure and start evaluating possible solutions.
Broadly speaking the Reactive Pipeline can be chronologically divided into three phases:
- Assembly phase
- Subscription phase
- Runtime phase
Each one of them is specific to a different lifecycle of the pipeline, and it is as follows:
- The Assembly phase is when the Project Reactor framework actually builds the pipeline, and the end result of this is the creation of the Publisher object (Mono/Flux).
This phase is called declarative because the developer declares the chain of operators that transform the given Publisher, but nothing is executed at this point. Source
- The Subscription phase is when the Project Reactor framework creates the Subscriber object and the end result is a Subscription object that will then receive the results of the Reactive Pipeline in which the data is fed from the Publisher object.
This phase is the trigger that enables the Reactive Pipeline to start the data flow and actually produce some results that can be consumed.
- The Runtime phase or Execution phase represents the moment when the pipeline is already publishing data from one end (the upstream) to another end (the downstream/subscriber) and it’s where the errors can manifest.
This is the phase where we can mitigate/resolve most of the issues we will encounter when working with Reactive Applications.
Now that we have the concept of the different phases clear, let’s have a look at the concrete issues that can happen and how we can mitigate them in a Reactive Application.
Types of Reactive Issues and Their Mitigations
In general, there are many types of errors that an application can encounter during its lifecycle, but what we are interested in are the errors that are directly related to the reactive nature of the application, and as such are linked with the Spring Boot 2 framework and more specifically with the way Project Reactor works.
As soon as you introduce asynchronous code, you create a boundary in the program between the code that schedules and the code that asynchronously executes, and this “misalignment” of code and times of execution is what makes the evergreen stacktrace much less useful, unfortunately for us.
Given that we cannot actually rely on the stacktrace to convey meaningful information and give us a clue on what can be the source of the issue, due to the asynchronous nature of the application, we need to take advantage of different operators and techniques to triage, debug and mitigate these issues.
On top of the less usefulness of the stacktrace in the reactive world, there’s also the compound issue of the different phases (Assembly vs Subscription) that makes it even harder to understand where the error occurs, as we will just see Project Reactor code in the stacktrace and no code of our own.
So, what techniques can we deploy to trace, debug and then mitigate the errors that can occur in the Reactive Application we just built?
The list below is a good summary of them:
- The “log()” method
- The “doOnError()” method
- The global, stop-the-world and performance heavy “onOperatorDebug()”
- The less invasive “checkpoint()”
These four techniques will allow us to gain a better insight on what’s happening in our pipeline, what kind of data is flowing through it and which operators are called for each subscriber, both in a global and in a “pipeline local” way.
The “log()” method
The “log()” method is a quick way to get an easy bird eye’s view of what is going on at one step of your sequence. Use it liberally during development, with the possibility of specifying a “name” to each log call to differentiate them, making it easier to identify which part of the pipeline emitted that data that’s been logged.
The types of operations that are logged by the “log()” method are:
- onNext
- onComplete
- onError
- onSubscribe
- cancel
- request
Each of these operators will be logged every time they occur, and also, it’s possible to specify a custom logger as well as whitelist just some of these signals if we are not interested in all of them.
This is the simplest and most straightforward operator that can be used when debugging issues in the Reactive Pipeline, but nonetheless is the cheapest one in terms of performance impact (compared to the global hook installer) and it can always be switched off if needed.
The purpose of this operator is answering the questions “How do I know what is happening when we subscribe to the Publisher?” and “I want to see the event sequence of this stream.”
With it, we can clearly see what is happening when we subscribe (in fact, we can also see when we are subscribing thanks to the logging of the “onSubscribe” event) and we can also observe the data that’s flowing through the pipeline via the logging of the “onNext” and “request” events.
All in all, this is a very easy to use, yet moderately powerful operator that will make it easier for the developer to precisely understand what is going on in the Reactive Pipeline.
The “doOnError()” method
This operator is intended to be used when the pipeline throws an error and we want to consume that error, most commonly for logging purposes.
The way the operator is used is very straightforward and we can pass a consumer function to it which will receive the Throwable/Exception object that represents the error and we will have a chance to act on it.
While it’s useful to log information about the error that occurred, this is not the clearest way to obtain more insights on what actually failed, as the Throwable/Exception object will not contain a clear stacktrace for the reasons we mentioned before.
What we will achieve will be adding more context to an error that was going to be logged anyway (unless explicitly disabled) with the hope that it could be useful when looking for the root cause.
This is pretty much all that needs to be said about this operator, it’s useful as an additional tool when debugging reactive issues, but it’s not that useful when used on its own.
The “onOperatorDebug()” method
This operator is probably the most powerful one we have at our disposition to deploy and use when debugging reactive issues.
What it does is that every single operator in the whole application will log a detailed, in-order stacktrace of what led to a given failure so that we can understand precisely where the issue occurred, where the failure was and what operators and in which order they have been called before the error happened.
This operator enables a recorder that is capable of capturing assembly stack trace information when the Publisher is created and also some very important information like the operator description, the chain order and so on.
If an “onError” event reaches one operator, it will attach that assembly stacktrace to the onError ’s Throwable (as a suppressed Exception). As a result, when you see the stacktrace you’ll get a more complete picture of both the runtime AND the assembly, which is something that greatly helps understand the issue. Issue.
This operator is so powerful also because it is a cross-cutting tool that can show the developer what was planned (Assembly phase) and what actually happened (Runtime phase), and any difference between the two is where our bug/issue will be found.
The power of this operator lies in the fact that all the other operators in our entire application will be instrumented to log stacktraces that have been re-ordered so we can clearly see what led to a failure and where exactly our code stopped working.
The downside of this operator is surely the heavy performance impact it has on the application, that’s why it’s discouraged from being used in production.
As long as we are developing, we can enable it to verify that our assumptions about errors are correct and that we truly mitigated the root cause of a given issue.
The “checkpoint()” method
This operator is intended to provide the same functionality of the “onOperatorDebug()” method but only for a given pipeline and not for the whole application.
What this means is that the performance impact has been greatly reduced if not almost eliminated because now the operator stacktraces are wrapped and instrumented just for a given pipeline, which it’s much better in terms of performance hit to the whole application.
There are different variants of the “checkpoint()” operator, but most of the time, what we would use is the one which takes a description argument, that will be printed in case error occurred in upstream of the checkpoint.
By using the checkpoint() operator, it is possible to activate the assembly trace capture only at that specific point in the pipeline, allowing for a much less impactful performance hit.
Conclusions
While it is true that the shit from imperative/blocking to reactive programming brings in a lot of performance enhancements and gains, it’s also true that the debugging experience changes a lot and we need to be ready for those changes and new techniques.
We explored the tools available for debugging reactive applications in the context of Spring WebFlux, examining each tool and possible scenarios where it was appropriate to use them.
In the next article, the second part, we will have a look at what happens when we introduce manual threads and schedulers into our own reactive application.
Stay tuned for the second part next week!
Request a Free Opportunity Assesment
RELATED CONTENT
16 Powerful B2B SaaS Growth Channels
B2B Growth Channels are essential to expand your reach and get your target audience to pay attention to you. Some of these are direct channels such as and some are indirect. Integrations and APIs with other market-dominant players can help you with your own growth. Even if your core product would not require integrations to…
B2B App Marketplaces
B2B app marketplaces are rapidly gaining popularity as a customer acquisition channel for logistics providers. These marketplaces offer a range of benefits, including increased visibility, improved customer acquisition, and streamlined sales processes. Listing software on B2B app marketplaces provides SaaS providers with increased visibility, as these platforms are frequented by businesses searching for software solutions.…
A Deep Dive into SaaS Distribution Models
There are a few commonly used SaaS Marketing Channels including inbound and outbound sales forces, eCommerce self-service, and third-party app stores. There is no distribution channel that’s ideal for all SaaS companies under all conditions. Instead of thinking about a best distribution channel, it’s useful to think of marketing channel selection as a “best fit”…
Promote your SaaS to Prospective Customers via 20 Top SaaS App Marketplaces
SaaS App Marketplaces present huge opportunities as an acquisition channel. This is because your prospective customers are already using these established SaaS platforms making it easier to get found by your ideal customer. You can also tap into getting more business by association. Since some processes are built around certain products (eg. Shopify), certain businesses…
Nurturing and Building an Ecosystem around your Magento App Store Integration.
Introduction In today’s rapidly evolving eCommerce landscape, businesses are always looking for ways to differentiate themselves from the competition and deliver a superior customer experience. For many businesses, this means leveraging Magento, one of the leading eCommerce platforms on the market today. However, simply having a Magento App Store integration is not enough to ensure…
An App Store for Super Apps
The pandemic saw ten years of merchant technology adoption in three months. That has cooled a bit but there is really no going back. If you work in eCommerce, whether it be B2B or B2C, you know this statement is true. A key driver of BigCommerce’s scaling initiative is the “super app” trend, where payment…
A Successful eCommerce Integration Journey in 6 Phases
After watching 100s of companies launch eCommerce Integrations and App, a pattern started to emerge. Most companies went through the same phases and many did the same mistakes or faced the same problems throughout their journey. Understand your Target Market Identify your target market and conduct market research to understand their needs and preferences. This…
The Ever-Growing Importance of Multi-Channel Sales via eCommerce Connectivity
Expanding to multi-channel sales is critical in eCommerce. Selling across different platforms results in as much as 3X the engagement as selling on a single channel. Multi-channel sales rely on synchronized data so it requires each eCommerce platform to communicate with each other in as close to real time as possible. Check out this Anchor…
Essential & Actionable B2B SaaS KPIs to Follow
We thought we’d share this blog article on 3 essential B2B SaaS KPIs to follow. For most SaaS products and services, their most significant expense is the cost of acquiring a new customer. An excellent way to reduce this expense is by channeling resources and efforts into acquisition channels with the highest return. At Itembase,…
5 Customer Acquisition Channels for B2B Apps
Check out this concise article on five principal customer acquisition channels for B2B apps. As we mentioned previously, theWinning in the B2B SaaS Market B2B SaaS market is getting more and more sophisticated and it’s not getting any easier out there. More and more apps are competing every day to attract the most traffic. If…