HTTP Status Codes and Business Errors

In a three-tier architecture involving a frontend system (FE), a middleware layer (MW) and a backend system (BE), the FE sends requests via the MW to the BE. However, the ML is not a pass-through component, rather it acts to decouple the FE and BE. The MW also acts as a secondary security layer, inspecting and where appropriate rejecting improper or malicious requests.

This is a common arrangement, used wherever there is an ESB, legacy integrations and to varying degrees with microservices. In normal operations (happy path scenarios), the arrangement works perfectly; the request is received from the FE, processed by the MW, and all being well, forwarded to the BE. The arrangement also works well if/when technical errors are encountered.

It is useful to define what is meant by a technical error, to wit, any failure that occurs on account of the network, infrastructure, application code, or the basic structure of the message (headers, security or payload). There are two scenarios that are pertinent to this discussion. In the first, the FE sends an invalid request and the MW rejects it immediately, returning a 4xx error to the FE.

In the second, the FE sends a request, the MW accepts it and forwards same to the BE. The BE rejects the request with either a 4xx or 5xx. However, since the problem occurred in the hop between the MW and the BE, the MW remaps the error to a 5xx, which it returns to the FE. The remapping aligns correctly with the semantics of HTTP status code groups 4xx and 5xx, given that the MW is not a pass-through. See the diagram below for an illustration.

This may sound incorrect or misleading, however, in practical terms, any technical errors that occur after step 2 are most unlikely to have anything to do with the FE, given the functions performed by the MW. A 4xx would imply that the FE was at fault, and as such, the error details would be required for it to rectify the problem, whereas in this case those details would only make sense to the FE. There may be exceptions of course, but the approach is good for most cases.

But things start to creak a bit when business errors are encountered. Once again, it is pertinent to pause and define what constitutes a business error. In this context we refer to all errors that occur after the HTTP dialogue has completed successfully. This implies that the FE has sent a request to the MW, the MW has examined the request and passed it to the BE. The BE has in turn accepted the request without faulting any of the technical aspects: network, infrastructure, application code or message.

Having done some further processing of the message though, the BE identifies a problem with the request and reports this back to the MW. The problem here is that the HTTP status codes did not have business errors in mind during compilation. The status code groups (1xx, 2xx, 3xx, 4xx and 5xx) map very cleanly to genres of technical failures that could occur in the HTTP dialogue, at a technical level.

However, when it comes to business errors, there is no such clear mapping. Indeed, the only status codes that could, with some imagination, be leveraged for business errors are: 400, 409 or 422. The 400 code is a generic bad request; the 409 is an inconsistency of state, and the 422, introduced for WebDAV, indicates a semantic error. Neither is perfect, but even with all these, there would still be a huge shortfall. It is clear that there are far more than three kinds of business errors in the real world.

To further complicate matters, there are those that will assert that business errors are not errors in the sense implied by the HTTP status codes and should use the 2xx group, because, technically, there was no error. An alternative viewpoint emphasises the need to be aligned with the semantics of the HTTP status codes, and the fact that the MW is an active participant in the dialogue, as such, all errors, 4xx or 5xx ought to be mapped to a 5xx.

In a recent article, I spoke about ‘the fog of experience’, and how opinion and decisions become more nuanced as one travels down the paths and routes of our dynamic and ever evolving technological landscape. This is one of those where the answer is not without contention and some subjectivity. It is likely that various architects and organisations may differ on approach, but in each context, the opinions and final decision will be informed by knowledge and a pragmatic search and compromise on progress and value.

In my current context, the consensus is to accept the limitations of the HTTP Status codes and the fact that they are not about to change. However, standards and consistency are important to quality and velocity in an Agile and product-centric service model. For this reason, the decision was to recommend that all BEs capture business errors using the 422 HTTP status code. All such errors are to be accompanied by a payload that describes the business error, and optionally, any details that could help the FE to rectify it.