CodeOpinion: Software Architecture & Design

Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

  • How I became a software architect… (or not)

    How did I get involved in software architecture & design? Do I think of myself as a software architect? Here are some insights about my career, which have helped me and ultimately led me to YouTube and produce videos about software architecture and design. Here’s a bit about my journey writing in software.

    YouTube

    Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

    https://www.youtube.com/watch?v=6j-PyJ1tFn8

    Operations

    I’ve always been involved in IT infrastructure and operations. Before working professionally as a software developer, I was interested in how software runs. Setting up my own servers at home and running Linux was a hobby. When I started working professionally as a software developer in the late ’90s, I was responsible for writing e-commerce software and the web servers they ran on, mail servers, DNS, etc.

    That foundational knowledge of operations carried throughout my career, where I was more and more involved as the industry evolved from physical servers to on-perm VMs and then to the cloud.

    You could say that before the “devops” movement, that was already what I was doing was heavily involved in both writing software systems and managing the infrastructure that ran them. DNS, database replication, web servers, load balancing, the list goes on.

    This has been invaluable, especially when working in small teams and startups.

    Business

    I’ve always written software for line-of-business types of large systems: E-Commerce, Distribution, Manufacturing, Accounting, and Transportation.

    While they are all very different, they’re all lines of business with some overlap. I’ve always said the best developers I know working in these systems have good developer skills but even better business knowledge.

    Understanding how businesses work and how money flows. How do they make money? What are the core parts of each domain where they generate revenue? How do they incur costs? There are business processes that are the main workflows of any business.

    It sounds simple, but it’s understanding the business.

    If you’re working in these types of systems, understanding the fundamentals of your domain will serve you well.

    Ah-ha!

    That segways well into a big ah-ha! in my career: finding Domain Driven Design. Because I lived in various domains, this book was a game-changer in multiple ways. The first was it was a gateway. DDD led me to CQRS. CQRS led time to Event Sourcing. Event Sourcing segwayed me into Event-Driven Architecture.

    People like Greg Young and Udi Dahan, the blogs they wrote, and the conference talks they presented were eye-opening and shaped many of the ways I think today.

    DDD also validated some thoughts I had and forced me to think deeply about design. A lot of people, still today, are hung up on the “tactical patterns” of Domain-driven Design. Aggregates, Entities, Value Objects, Repositories, etc. While they have value, they are a means to an end. The real value in DDD for me was thinking about boundaries on many different levels.

    A lot of how I think now is rooted in coupling and cohesion. If you boil things down, they generally return to those two and the trade-offs you make for them.

    Working with large systems means thinking about cohesion and defining boundaries. Coupling is inevitable, but how you manage it is vital.

    Software Architecture and Design

    Being involved in greenfield projects working in small companies and startups has forced me to think about software architecture and design. How to decompose large systems. However, I’ve also been thrown into the fire several times to work on a large existing (brownfield) project without any support. Zero. A large system that is running a company, and you’re the one that’s got to deal with it. Fix bugs, add features, and deal with the infrastructure of how it’s running. There is no developer documentation. Nobody to ask. Nothing. It sounds stressful, and it was. But coming out of those situations gave me experience in navigating large codebases and seeing a lot of issues with these types of systems.

    When you’re working on a greenfield project, especially for a startup, you don’t have time for bikeshedding. Make the best decision you can, given the information you have, and move forward. Too often, developers can get caught up in over-analyzing and bikeshedding over irrelevant things.

    Software architecture, to me, is about making decisions that give you options in the future but with little cost. I talk about this in my post: What is Software Architecture?

    Systems need to evolve. Technology changes, businesses change, industry changes, your system needs to change.

    Do I think of myself a software architect? No, not really. It’s always been apart of my role. Thinking about decomposing systems, how a system should function, how the business works, and how to model that is always been apart of my role. I understand there are folks with the title of “Software Architect,” and I think that’s a role that can exist in certain situations, but I do think everyone can have a role in architecture and design on a team. Even if that’s understanding the reasons why certain decisions were made.

    YouTube

    So, how did I end up creating videos on YouTube about software architecture and design? The primary reason, and why I still do it today, is that I think there is a lack of good content on the topic.

    There’s too much emphasis on “how” and not enough on “why”.

    I create videos that I would want to watch. They aren’t for everyone, and that’s fine. They aren’t for people that want the “show me the codez!” videos.

    I absolutely show code in some videos, and it’s for illustration purposes to explain a concept. Not surprisingly, although I show code in C#, there’s a large number of viewers who do not use C# at all.

    I said there’s a lack of good because there’s also a lot of content that is, to me, misleading, bad, clickbait, and incorrect. Does that mean that everything I post is correct? No. Because of that I only post videos/blogs that I’m confident in based on my experience. My lane is software architecture and design. I post content around it based on all my 20+ years of experience.

    Hopefully, what I post will resonate, and like the YouTube comment above, hopefully, you’ll get an “aha” moment, just like I have from other folks I mentioned above.

    Good luck on your journey!

    Join!

    Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

    Follow @CodeOpinion on Twitter

    Software Architecture & Design

    Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

  • Have you replaced your DB because of the Repository Pattern?

    Have you been able to replace your database implementation transparently because of the use of the repository pattern? While this is a controversial question and topic, I will explain why it doesn’t need to be. Sure, you’re creating an abstraction, but an abstraction around what?

    YouTube

    Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

    https://www.youtube.com/watch?v=EwKhyp2kHME

    Repository Pattern

    First, we need to agree on the definition of the Repository Pattern. I will use the definition in the Patterns of Enterprise Application Architecture.

    Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer.

    I think everyone can agree on that definition, but I think some of the confusion is around what a “set of objects” means. What type of objects are we talking about?

    I’ve found two ways people think about objects in a repository. The first is thinking of them as a data model. It represents how you persist data in your data store and the mapping that goes along.

    The second group of people comes at it from a Domain Driven Design perspective and thinks about a domain model of aggregates composed of entities and value objects.

    There are generally, these two sides view a repository differently, hence why the use of a repository can become a hot topic.

    Differences

    On the surface, you might think of an aggregate or a data model with some hierarchy as being the same. But there’s a difference.

    In the example above, I have a repository that fetches a basket and its collection of basket items. This object model hierarchy sure looks like an aggregate.

    However, the difference between an aggregate and a data model is behaviors.

    Behavior

    With an aggregate, you’re exposing behaviors to make state changes. An aggregate represents a consistency boundary. The root entity is the entry point to invoke behaviors that will make changes to the state of the aggregate as a whole. Because of this, it’s also the place where business rules are enforced.

    Here’s an example.

    https://gist.github.com/dcomartin/d7f88e69dc0b6318079aa8b89edb14f5

    While this is a trivial example, AddItem ensures that only one item in our _items collection exists for a catalogItemId. Because items themselves don’t know about other items, this is why our Basket is the aggregate root.

    If we worked with a data model, we could have this logic in a transaction script or somewhere higher up the call stack, but the point of our aggregate is to hold all the logic to where we make state changes.

    Data

    So, if you’re making state changes and need to encapsulate those behaviors within an aggregate, what situation would you want a data model?

    Typically, you want to query data for UI purposes. In that case, we’ve established a difference between needing to make state changes and querying data. Guess where we’ve landed? You guessed it, CQRS.

    So, when you start thinking about the difference between the two, why would I want to use a repository that’s returning an aggregate when I don’t need to make state changes?

    Sure, you can use an aggregate for queries if you expose the data within them, but do you need all that data? Your aggregate is backed by all the data used for state changes to ensure consistency. Queries are often specialized for their use case. They don’t always need all the data. Ultimately, you’d often be over-fetching data if you returned an aggregate for query purposes.

    You’ll likely conclude that then you only need a repository for commands. If a repository returns an aggregate, and an aggregate is used for state changes, that only happens when executing a command.

    Queries are a use-case. As mentioned, it is often very specialized for a specific need to fetch specific data. Using a repository to over-fetch data and then having to transform it into you use case can add a lot of indirection and complexity.

    Let your queries define how they retrieve data. Does this mean not abstracting your data access? It can. Or it might not. You make that decision. Now, you’re talking about coupling between queries and your database. Do you need an abstraction if you have ten queries coupled to your database? Do you need an abstraction if you have 1000 queries coupled to your database? Your answers may differ.

    Repository Pattern

    Thanks to David Fowler for tweeting this, as it inspired this video/blog. The answers to this question are going to be all over the place. Some don’t abstract or use the repository pattern because they think or have never replaced an underlying database. Another group of people always abstract and use the repository pattern because they have replaced their underlying database.

    I think the key thing to be thinking about, however, is your use case. If you’re using a repository for commands and returning aggregates, that repository/abstraction will look very different than an abstraction that’s for querying data in a very specific way and/or doing transformations while querying the database and not in memory once data is returned.

    A repository for an aggregate might have only a few methods. GetById, Save. There may be a few more, but you know exactly the aggregate you need to call. You’re not querying and filtering by other data elements, it’s generally by the key.

    Join!

    Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

    Follow @CodeOpinion on Twitter

    Software Architecture & Design

    Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

  • Distributed isn’t Microservices, In-Process isn’t a Monolith

    Amazon Prime Video moved one of its monitoring services from “microservices” to a “monolith”. I’m using quotes because that’s how they termed it in the post, which did themselves a disservice by making this statement. Almost every blog post or video covering this has missed the mark. All they did was refactor. This has nothing to do with microservices or a monolith.

    YouTube

    Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

    https://www.youtube.com/watch?v=qndSXhknxRc

    Original Architecture

    You can check out the original blog post from Amazon Prime Video, but here’s a quick summary that sets the stage for what they originally had as an architecture and what they moved to. There are a lot of hot takes about what they did, but most are way off base. So first, what was their original architecture?

    Amazon Prime Video Distributed

    There’s an audio/video stream that goes to a Media Conversion service, which extracts frames from the video and puts those to an S3 bucket. Then, there’s a workflow with step functions that pull those frames (data) from S3 to analyze. They call these detectors that do the analyzing.

    They stated that this was fine initially, but as the workload increased, this architecture was no longer viable. They stated that they never intended nor designed it to run at a high scale.

    A major issue wasn’t anything to do with execution but cost. Because everything is distributed, each step function has to pull the data from S3 to analyze it. This data transfer does add latency to their processing times, but more so it’s monetarily costly that was an issue.

    New Architecture

    The architecture they moved to seems logical to me. Instead of using lambdas for analyzing the frames, they instead moved them to be within the same process within an ECS task. The audio/video stream data is going directly to this ECS task. This means S3 is no longer used at all. Since everything is in memory, there’s no pulling data from S3, it’s already in memory.

    Amazon Prime Video Monolith

    Nothing is distributed anymore. The media converter and the detectors (to analyze the frames) are all within the same ECS task (container).

    Hot Takes

    Unfortunately, this blog post stated that they moved from microservices to a monolith. There were all kinds of hot takes that seem to of missed the point or just didn’t even read their original post.

    Clickbait

    I’m not sure if people purely do this for clicks/views or if they just don’t really understand what the architecture change was. Or possibly they just didn’t even read the original post. I’m not sure.

    What Amazon Prime Video did was change the physical aspect of their deployment. That’s it. It’s not that serverless sucks or they don’t understand what it is. It’s that given their specific use case, at a higher volume/scale, it wasn’t cost-effective.

    They moved their code to be composed into a single process, which elevated the cost of distributing the workload and data. That’s it. As stated in the post, a lot of the code was reused, and it allowed them to quickly do this refactor. It wasn’t a rewrite.

    Microservices to Monolith?

    Amazon Prime Video said they moved from a microservices to a monolith. But that’s not what they did. They moved from a service… to a service.

    What’s a service? It’s the authority of a set of business capabilities. That didn’t change. What changed was the physical boundaries.

    Physical boundaries aren’t logical boundaries.

    A logical boundary defines what the capabilities are. Their logical boundary is still the same, but their physical boundaries changed.

    I think the confusion lies in thinking that the media conversion is its own “service” and the step functions detectors that do the analyzing are their own “service” but that’s not the case. The service is everything.

    Using lambdas or having different components distributed doesn’t suddenly make it microservices. And just because you have all components executing in the same process/container doesn’t make it a monolith.

    If you have a logical boundary and you have some source repo, that could turn into one container/process that is an HTTP API. You could also have another container/process that executes as a background worker service. You can also be scaling out and deploy multiple instances of both of these.

    One more time for those in the back. Physical boundaries aren’t logical boundaries. Check out my post The Pendulum swings! Microservices to Monoliths for more on this.

    Refactored

    So what did they really do? Did Amazon Prime Video move from a microservices to a monolith? No.

    They refactored. That’s it.

    They evolved their architecture based on changing load and cost. They realized at the scale they needed that it wasn’t cost-effective to distribute the workload with step functions which then required distributing the data via S3.

    They evolved and refactored to move it all within an ECS task so that everything was in memory.

    Makes sense.

    Join!

    Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

    Follow @CodeOpinion on Twitter

    Software Architecture & Design

    Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design