What is Software Architecture?

Software architecture is about making critical decisions that will impact how you can make decisions in the future. It’s about giving yourself options at a relatively low cost early on so your system can evolve without a high cost. Software architecture is about options.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Cost of Options

Software should be malleable. It shouldn’t be so rigid that you can’t change it because of new requirements or insights about the domain or the model you’ve built. You want to be able to evolve your system over time as all these emerge. You don’t want to have to pay a high price (complete rewrite) because you’re system is hard to change. Giving yourself low-cost options means making decisions that allow you to evolve your system over time but don’t come at a high cost (time/effort/etc.)

This means you have to pay the price initially to give yourself these options, usually early on in a project/product. These are critical decisions in choosing which options are low-cost and high-value.

I’m not talking about “what if” scenarios. Developers tend to make assumptions, often related to technical and business requirements. I’m not referring to all kinds of edge cases or technical concerns like scaling, which developers love to focus on.

The options I’m talking about are fundamental to your architecture. How you develop the system allows you to evolve it over time.

Coupling & Cohesion

A lot of software design comes down to understanding and making decisions based on coupling & cohesion.

To me, coupling & cohesion are the yin-yangs of software design. They are a push & pull against each other. You’re trying to increase functional cohesion and lower coupling.

There are different forms of coupling, but to give a definition:

“degree of interdependence between software modules”

ISO/IEC/IEEE 24765:2010 Systems and software engineering — Vocabulary

And for cohesion:

“degree to which the elements inside a module belong together”

Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design

Why do coupling & cohesion matter? Because ultimately, a lot of the decisions made are rooted in one or both of them, even if you don’t realize it.

If you do understand coupling and cohesion, it can better help you make decisions that provide options.

The 3 concrete examples I will provide in this post are all rooted in coupling & cohesion.

For more on coupling & cohesion, check out my post: SOLID? Nope, just Coupling and Cohesion

Logical Boundaries

The first way to give yourself options within your architecture is to define logical boundaries. Grouping related behaviors and functionality (capabilities) of what your system provides. Having groupings of capabilities that are functionality cohesive.

Not having a system that is free for all of functionality without any boundaries.

Software Architecture: Turd Pile

A piece of functionality shouldn’t be intertwined with other unrelated functionality. In other words, the dependencies for one piece of functionality shouldn’t affect another. An example of this is a database. A set/grouping of features should own and be responsible for the underlying data for that feature set.

Define logical boundaries where you’re grouping functionality that works together on a set of underlying data. Focus on the capabilities and behaviors of your system. Group those capabilities into logical boundaries.

Software Architecture: Logical Boundaries

Within a logical boundary, you can make decisions that are isolated within it. How should you perform data access? Which type of database is best for the data set of that logical boundary? How should we model within the given context? Different boundaries will have different models. By defining logical boundaries (cohesion), you can make all kinds of decisions that are best for the feature set within that boundary. This gives you options.

Boundaries are one of the hardest things to define correctly, yet the most important things to do. Check out a whole series and talk: Context is King: Finding Service Boundaries

Loose Coupling

If you’re defining boundaries, how do you communicate between them?

In a free for all, there is coupling everywhere. You have different parts of the system that are directly coupled to other parts of the system. This could be coupling between classes/modules or, generally, at it’s worse, via the database.

I often refer to this as a turd pile. But it’s just a system that’s lost control of coupling.

Software Architecture: Coupling

If you’ve defined logical boundaries, as explained earlier, you’ll likely need to communicate between them. Any system will have long-running business processes and workflows that span many logical boundaries.

To remove tight coupling, we can leverage asynchronous messaging. Removing direct communication between boundaries means we are also removing temporal coupling. In other words, you’re not bound by time.

Message and Event Driven

This means that one boundary can send a message to a queue for another and can be processed independently.

Because you have logical boundaries (cohesion), this works best with long-running business processes or workflows. So often, we model our workflows as being synchronous requests/responses when in reality, we could be building a much more resilient system by making them asynchronous.

Asynchronous messaging and event-driven architectures give you options by loose coupling! Check out my Real-World Event Driven Architecture! 4 Practical Examples

CQRS

Unfortunately, CQRS is a buzzword (acronym) that is widely misunderstood.

Command Query Responsibility Segregation is often conflated with Event Sourcing, Asynchronous Communication, Domain Driven Design, Multiple Databases, and more. If you search and read enough posts, you’re bound to find a similar diagram.

CQRS Confusion

Sadly, while this is CQRS, as mentioned, it’s also conflating a bunch of other patterns or concepts. CQRS is nothing more than separating reads and writes even from a service layer.

CQRS

Yes, really. It’s that simple. Still don’t believe me? Check out my post CQRS Myths: 3 Most Common Misconceptions, where I reference many of the early blog posts from Greg Young.

So why is it important? Because it gives you options. Defining separate paths for reads and writes allows you to make decisions for each path and each occurrence.

If you look at the first diagram illustrates a Command Bus, a Domain model, Event Sourcing, and a projection (multiple databases). All of that is facilitated by the decision to separate commands and queries.

CQRS is a gateway to other patterns and concepts, but at its core, it’s pretty trivial but it gives you options!

What is Software Architecture?

To me, software architecture is about critical decisions, usually early on within a product/project, that give you future options. Options that allow you to evolve your system over time. The cost is usually relatively low when making decisions and giving yourself options at the very beginning. Defining logical boundaries, loose coupling between boundaries with a message/event-driven architecture, and CQRS.

As always, it’s rooted in coupling and cohesion.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Where should you use gRPC? And where NOT!

I’ve recently read a few blogs and watched videos that compare gRPC with REST and GraphQL. It seemed like the majority claimed that gRPC is the standard for communication between services without giving any real reason. I think it would be better served to explain where and the situations where gRPC could be useful and where I’d avoid using it.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Query Composition

I can see gRPC being useful where requests are naturally request-response. Queries and performing UI/ViewModel Composition are naturally request-response and would be a good fit for gRPC.

When you have multiple services that own various pieces of data that the UI/Client requires, you need to perform this composition somewhere. One option is to do this with a Backend-for-Frontend (BFF). The BFF will make all the relevant query calls to all the services to get data. It then composes that day and returns it to the client.

gRPC Query Composition

Using gRPC from BFF to each service can be synchronous request-response. Typically you’d probably see an HTTP API (what most would call REST) in this case; however, gRPC would be another option here.

Infrastructure & 3rd Parties

Another place I see gRPC being useful is when needing to make calls to infrastructure (as a service) or 3rd parties.

For example, a database is a good example of infrastructure that is also naturally request-response. Yes, it’s a core part of your system, but it isn’t a service. EventStoreDB provides gRPC clients for interacting with it. This makes sense as it allows you to interact with the database using gRPC rather than having to produce native SDK for every language/platform.

Another good example of gRPC is with 3rd party services. Meaning services that you don’t own. As an example, this could be a currency exchange or map routing. You could also be the 3rd party providing the service. gRPC could be a good fit here as well.

Service to Service Nightmare

I do not see gRPC being a good fit with service-to-service communication. As mentioned, I’ve read enough blogs/articles/videos that state that gRPC is or should be the standard for service-to-service communication. I disagree.

I’ve mentioned a few times that I think gRPC is a fit in naturally request-response situations. That’s what gRPC is. Remote Procedure calls that are often blocking. Service-to-service communication using blocking RPC calls can lead to a nightmare of coupling and terrible reliability.

Let’s say we have a client that makes a request to ServiceA. ServiceA then makes a blocking synchronous call to ServiceB. It then makes a blocking synchronous call to some external service or 3rd party.

gRPC Service to Service

Once the call from ServiceB completes, ServiceA then makes a call to ServiceC.

gRPC Service to Service

Little do we know, since it’s free for all of services being able to call other services, that ServiceC then makes a call to ServiceD. But guess what? That call fails!

gRPC Service to Service Failures

Since we aren’t a single process, we don’t have an easy way to catch an exception, nor will we get any type of good stack trace. ServiceA has to handle the failure ultimately, but it’s not ServiceC causing the issue but further downstream.

If state changes were happening in any of the services called, you do not have a distributed transaction. You must handle these failures and decide how to roll back or perform compensating actions.

Total nightmare. For more headaches that can come with this, check out my post REST APIs for Microservices? Beware! The same would apply to gRPC.

Messaging

What was likely trying to happen was that a workflow was being modeled by blocking request-response between services. To add reliability to your system and business processes, model them using an asynchronous workflow by using messaging.

By using a message broker, you eliminate direct communication between services. Services are no longer temporally coupled. If one service isn’t available, that does not stop or break the workflow.

Messaging can also be done in a request-reply style but still be asynchronous. This allows one service to call another and get a reply, but asynchronously.

When a client requests a service to start some type of workflow, the service can send a command/message to the broker (queue) for some other service to perform some work that is a part of the workflow.

Async Request-Reply

Another service will consume this command and perform whatever actions it needs to take part in the workflow.

Async Request-Reply

Once it’s consumed and finished processing the message, it can provide a reply message for the originating service.

Async Request-Reply

Finally, the originating service can consume the reply message and handle it however it needs to. This could mean it might send another command to the broker for a different service to perform its part of the workflow.

Async Request-Reply

If the client needs to be notified that the entire workflow is done, you can leverage WebSockets or push notifications to add real-time capabilities to the client so they can be notified of completed work.

If there is a failure processing a message, a service can retry, backoff, and retry processing again or ultimately put the message on a dead letter queue. You have many more options on how to handle processing failures.

Because each service works independently, they do not all need to be online and available. If one service is down and unavailable, that does not make the entire workflow stop or fail. They aren’t temporally coupled.

For more, check out my post Workflow Orchestration for Resilient Systems

gRPC

You might have noticed that this post isn’t about gRPC but rather the places where synchronous request-response is appropriate. Should you use gRPC over an HTTP API in those situations? I’ll save that for another post!

Context

Context is king. The context and perspective I’m referring to within this post/video are to do with line of business and enterprise-type systems where business processes and workflows are naturally asynchronous. The world is asynchronous, and yet we still try and primarily model these business processes in a synchronous way. Should all workflow be asynchronous? Absolutely not. Context is king.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Shared Database between Services? Maybe!

Is a shared database a good or bad idea when working in a large system that’s decomposed of many different services? Or should every service have its own database? My answer is yes and no, but it’s all about data ownership.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Monolith

When working within a monolith and generally a single database, you can wrap all database calls within a transaction and not think too much about failures. If something fails, you roll back. You can get consistency by using the right isolation level within your transaction to prevent dirty reads (if needed).

Monolith Shared Database

One challenge with a monolith is that it’s a single database; it’s often a free for all data access. Reads and writes are performed from anywhere within the monolith. Most often, a monolith for a large system will have a pretty large overall schema of hundreds of tables/collections/streams.

Because this leads to so much coupling, people tend to then go down the route of trying to define services that own certain parts of functionality within the system. However, they don’t separate the underlying data and keep a shared database.

Distributed Turd Pile

This often leads to what I call a Distributed Turd Pile. Or also known as a distributed monolith or a distributed big ball of mud.

The application has been split up into multiple services, but there is still a shared database where both services perform reads and writes. There is no schema or data ownership.

Shared Database

If you need to change a table/document and add a new column/property, which services do that? If different services are owned by different teams, which team is responsible for doing so? If you make a change, every other service now needs to be aware of that change, so it doesn’t break them.

Ultimately in a distributed turd pile, you’ve siloed the functionality but still have a massive schema that’s a free for all with no ownership.

Physical vs. Logical

When I originally asked, can you share a database between services? You’d guess my answer based on the above would be no, which is correct when talking about it from a logical perspective. Services should logically own their schema and data.

Physical boundaries aren’t logical boundaries, however. This means that you can have ownership of schema and data but still keep that within the same physical database instance as another service.

Shared Database Instance

This means that a single shared database instance can hold the schema and data for different services. This isn’t free for all of data access. Only the service that owns the schema can access it and perform reads and writes.

It’s about logical separation, not physical separation.

You don’t have to have a physically separate database instance for each service. This can be helpful in various scenarios, from a local developer to staging to even production if you have limited resources. Should use share the physical instance, maybe not if one service could consume a lot of the resource/capacity of the instance (noisy neighbors). Context matters; your situation matters. The point is, don’t confuse physical and logical boundaries as being the same.

I touch on this more in a blog post Microservices gets it WRONG defining Service Boundaries.

Query & UI Composition

If services have schema and data that can’t be accessed directly by other services, how do you do any type of UI or Query Composition?

One option is to create an API Gateway or BFF (Backend for Frontend) responsible for making the relevant calls to the required services and doing the composition to return to the client.

BFF

Another option is to use Event Carried State transfer to give services a local cache copy of reference data that can be used for UI composition.

When a service makes some type of state change, it will publish an event containing the state of the entity that changed.

Event Carried State Transfer

Other services can consume that event, update their local database, and use it as a cache.

Event Carried State Transfer

I do not recommend doing this for workflow or any business process. This isn’t for transactional data. This is for reference data that is often more in a supporting role of your system. This data isn’t volatile and fits well to be cached for this reason.

Lastly, another option for UI composition is to do it on the client itself. Each service can own a piece of the UI.

Consistency

It’s important to touch on consistency. Using a local cache when executing a command means using stale data. There is no difference between making an API call from service to service to get data to perform a command and getting it from a local cache. They are both going to be inconsistent when executing a command. Why? Because the moment you get back a result from a service-to-service call or a local cache, the data is stale.

If you need consistency, you need data owned by the boundary that requires consistency.

I often find that some ownership confusion is due to how workflows are thought of. I often see workflows created to a single boundary when it may involve many different boundaries.

As an example, let’s say you have an ordering checkout process. The first call from the client would be to the ordering service. This likely would return a CheckoutID if it wasn’t already defined by the client.

Workflow

Next, the client would send the credit card information to the payment service with the same CheckoutID.

Workflow

Finally, the client reviews their order by sending a request to the Ordering service to Place Order. The ordering service would make relevant state changes to its database and then publish an OrderPublished event or perhaps send a ProcessPayment command.

Workflow

The payment service would consume this message and then use the credit card data it already has within its database to charge the customer and hit a payment gateway.

Workflow

Shared Database

Can you share a database between services? Yes and no. You can share the physical database instance but not the schema and data. A service should own schema and data and only expose it via explicit contracts of an API or Messages. Don’t have data access free for all.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design