Using your Database as a Queue?

Do you need a message broker, or can you use your database as a queue? Well, as in most cases, it depends. Let me explain how it’s possible and some trade-offs and things to consider, such as volume, processing failures, and more.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Database as a Queue

There was a blog post about ditching RabbitMQ for Postgres as a queue. TLDR: They claimed they had issues with prefetching with RabbitMQ and it was causing issues because they have long-running jobs. When they process a message, it can takes hours to complete. Another note is they are doing manual acknowledgments once a message is processed.

Invisibility Timeout

Acknowledgments are important because of the invisibility timeout of a message. When working with a typical queue-based broker, you need to send an acknowledgment back to the broker that you’ve finished processing a message.

The reason for this is there is a length of time (timeout) before the broker deems the message processing as incomplete so it will then allow that message to be processed again. Imagine if your process fails or you have some exception that occurs. You’d want the broker to allow the message to be re-processed. If your processing of the message was successful, you return to the broker to tell it that it can remove the message from the queue.

This time period is called the invisibility timeout. The message is still on the queue but it’s invisible to any other consumers until it’s either acknowledged or the timeout occurs.

Now you may hear people say you shouldn’t use a message broker for long-running jobs because of this. You’d have to make your invisibility timeout longer than the total processing time it takes. Otherwise, you’d potentially re-process messages. This is often the reason why those claims are made. There are other reasons as well as throughput considerations and more, but generally having long-running has implications you need to be aware of.

Competing Consumers

I mentioned throughput because one way to increase throughput, when required is using the competing consumers pattern. This means having multiple instances of the same application processing messages from the same queue. It’s basically scaling out horizontally.

So if you have two instances running, they can both be consuming messages from the same queue. This allows you the not process messages concurrently, increasing throughput.

The issue they described in the blog post was that they were prefetching multiple messages together from RabbitMQ. This means that if there were two messages available, a single consumer would pre-fetch 2 messages, not one.

And since consuming a single message could take hours, this would then reduce throughput because the other consumers available don’t see the message. As well as, the invisibility timeout would kick in since the second message that was pre-fetch would often exceed the invisibility timeout because the first message took so long to process. But the consumer would still process it since it fetched it, but the since the invisibility timeout expired, the message would get processed again as well.

Because of all these troubles, rightfully or wrongfully, they decided to ditch RabbitMQ and use Postgres as their queue, which they were already in their system.

Table

So how would you implement a queue using Postgres or a similar relational database? First, it simply starts with a table where you’d be persisting your messages.

The table would have some Primary Key and the message’s data, which is likely serialized.

Any new message would be inserted into the table. As for consumers, you need to be blocking so you can lock any available records for processing. To do this in Postgres, you could use the FOR UPDATE SKIP LOCKED statement and a LIMIT 0,1. This will allow us to select a single record that hasn’t already been locked.

Then once the message has been processed, you’d delete the message from the queue and commit the transaction.

While you don’t have an “invisibility timeout” you do have the same type of mechanism with a database, often called a wait timeout. You can only have a transaction or connection open for so long before the timeout occurs and it automatically does a rollback of the transaction or closes the connection. You’re still in the same situation where you might still be processing the message and now will end up re-processing it.

Trade-offs

So what’s the issue with using your database as a queue rather than a queue-based message broker? As you can see, it’s feasible to use a database. In some situations, you maybe simplify your infrastructure by using a database you already have. However, as always, context is king!

You need to know what you’re trying to accomplish and the limitations. Once you get out of the simplest scenario that I showed above, you will end up implementing a lot of functionality on top of your database that out-of-the-box queue-based brokers support (dead letter queues as an example).

You also need to understand your use case. If you’re going to have a lot of consumers, they need to be polling (querying) the table periodically. This can add a lot of load but also can decrease total throughput and increase latency because you’re polling your database.

When you have failures, how are you handling that with your queue if its in your database? Are you implementing retries? Creating another table as a dead letter queue? Do you need to implement different queue tables for different levels of priority of messages? The list goes on and on that out of the box you can do with your typical queue based broker.

So what’s the issue with using your database as a queue? None until you actually need a message broker.

Join!

Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Speeding up Queries with Materialized Views

Many applications are more read-intensive than write-intensive. Meaning your application performs more reads than writes. However, we often persist data optimized for writes making querying and composing data intensive at runtime. What’s a solution? I will review different ways of creating materialized views and some trade-offs.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Materialized Views

If you have a large system decomposed into many different boundaries, you’ll have data spread across many different databases. When you need to perform any query, often for a UI, you need to do some type of UI/View Composition by composing all the data from various sources.

This is one of the most common challenges when working in a large decomposed system. Check out my post on The Challenge of Microservices: UI Composition for various ways of handling this.

One issue with this composition is often performance because even if you’re querying each database concurrently there is often a lot of computation that has to occur to transform into the file result back to the client.

A solution to this is to do that computation ahead of time so that when a client makes a request, you already have all the data composed exactly as you need it or to query it.

This sounds like a cache, right? It is, but we’re not caching exactly the same structures as what we have in our database, rather, we’re caching the end result that’s already done all the composition and transformation in it’s final for we want to return to the client.

Meaning, we’re storing a data structure that’s optimized for specific queries or use cases.

As an example, let’s use the typical Sales Order in a relational database.

If we want to display a list of Sales Orders and the order total, we can query the order table, joining with the order line items and doing a Sum() of the Quantity and UnitPrice. No problem.

When an order is completed, its state is set and won’t change. Meaning we can do some pre-computation when an order is placed and summarize (denormalize) into a separate table that’s specific to this use case.

Rather than have to query and join the tables, we just query this individual table directly. Now this example is rather trivial, but you can see how doing pre-computation can be helpful. Another example is when the order is placed if we have to calculate the tax on the order. Instead of having to do that at run-time or via a query, we would just calculate it once and persist it.

You also don’t have to persist any of this data separately. You can often leverage the existing structures you have. Meaning some columns are persisted as calculated columns that are specific for queries.

CQRS

CQRS fits naturally because the point is to separate commands and queries as distinct operations. Queries can be optimized separately from commands.

When a command is executed that makes some type of state change, we can perform the write operation and also do that pre-computation and also persist the data that are specifically for queries.

That way when we execute a query it can use that specific tables/columns that are explicitly for queries and that are optimized for them.

One caveat to this is if you are doing this precomputation at the time of the write, you could be adding latency to your writes. However, in a lot of systems, you’re often more ready-heavy than write-heavy. But it’s worth thinking about if you need low-latency writes.

Another option is to remove that pre-computation at the time of the write and do it asynchronously.

When you perform your state change from a command to your database, you then publish an event that the query side will consume. It will handle then updating your query optimized read side.

Because this is asynchronous, your read database is eventually consistent. This has implications about not being able to immediately query the read database after you complete your command. Check out my post Eventual Consistency is a UX Nightmare for ways of handling this.

It’s worth noting, just like above, these don’t have to be different physical databases. They could be the same underlying database instance but separate tables and/or columns as I explained above.

If you’re familiar with Event Sourcing, then you actually have a leg up as your events in your Event Store are the point of truth where you can derive all other types of views of that data.

This is called a Projection and it allows you to read the events from your event stream and do all the same type of precomputation and persist it however you want for query purposes. You’ll also notice there is no message broker in the mix because you don’t need one. You’re event store often supports different subscriptions or, at worst, poll it to get the events you need to process to build your read model.

Lastly, a common scenario is for reporting purposes, where you need to compose data from various sources. This is where I find the value in Event Streams where various logical boundaries and published events that are persisted.

I’m not talking about a queue-based message broker, as you want these events to be re-consumed at any time. They don’t have a lifetime and can be persisted in the event stream log forever. This allows you to rebuild your materialized view at any time.

Materialized Views

You have a lot of options for creating materialized views that are optimized for queries. It’s interesting that many applications are more read-intensive than write-intensive, yet we typically persist data in a structure that’s optimized for writes. You can find the balance and optimize both where needed, especially in hot paths within your application that are query heavy.

Join!

Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Change Data Capture + Event-Driven Architecture

Change Data Capture (CDC) is a way to monitor and capture changes in data so other systems can react to those changes or stay up-to-date with the data. It isn’t new, but it’s been gaining popularity around event-driven architecture. Let me explain so you don’t shoot yourself in the foot.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Change Data Capture

Change Data Capture (CDC) is a way to monitor and capture changes in the data in your database, often in real-time. Lately, combining CDC tools with an event-driven architecture has been a way to distribute these data changes. While this may sound good, it has some pitfalls, specifically around coupling.

Without CDC, you’d often be polling (pull-based) your database looking for changes on some interval using some scheduled batch job. Some common use cases would be for data warehousing and ETL purposes.

The common problem with this approach is each different process would be polling the database, which would increase the load on your database (or read replica). Another disadvantage of polling is you’re not getting the data changes in near real-time.

With CDC tools this turns into a more real-time push-based model. CDC tools monitor in real-time and publish data changes to consumers.

Tooling Example

Depending on the database you’re using is going to determine what CDC tool you use. For example, Debezium is a CDC tool that is becoming fairly popular and has a variety of connectors that capture changes and then produce events.

A toolchain might look like Debezium getting the data changes from the binlog of MySQL in real-time and then publishing those events to a Solace topic. Many consumers could then consume that topic.

Inside vs. Outside

This is all good, but where are you publishing these data change events, and for whom? I mentioned data warehousing, which can be a good use case. However, because you can publish events to a message broker or an event log, this is typically used to integrate with other logical boundaries. This is where you can shoot yourself in the foot.

If you’re publishing data change events consumed by another logical boundary (service), they are coupled to the internal implementation detail of your database schema. Typicalically, events derived from a CDC tool represent your database schema. If you publish these events and other services consume them, they know about your database schema.

We don’t want this coupling or them to know about our internal database implementation. There’s no difference in leaking your schema via events or another service reaching out directly and querying your database. Not a good thing. We don’t want this coupling.

Translation

To avoid this, you want to have some translation to create an event to be exposed to other logical boundaries. You want to create an integration event, or as I often call them, Outside Events. Outside events are contracts you can independently version from any of your internals. This allows you to change your underlying database schema without breaking consumers.

One important aspect of events is their purpose. If you’re using CDC to generate events, you’re likely thinking about data, not behavior. If you want to expose behavior-driven events, you must infer what happened. It can be very difficult to understand why data changed how it did. For example, you’ll likely end up with a ProductChanged event rather than an InventoryAdjusted event.

Another question you need to ask yourself is, why am I distributing data? Often it is because of wanting to query or UI purposes. Check out my post The Challenge of Microservices: UI Composition

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design