Should you Soft Delete?

Should you delete records from your database or instead use a soft delete? I was recently asked my view on this question by a follower on Twitter. So what’s my answer? Well, I’m not usually thinking about “deleting” anything. Instead, I’m thinking about adding.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Hard Delete

So we’re all on the same page, let me take a step back and discuss deleting data in general. Hard deletes refer to removing the data from the database.

For example, with a table in a relational database, there are two records in the Employee table.

Table of Rows

Doing a hard delete would mean executing a delete statement that would physically remove the record from the table. Only one record would be remaining.

Hard Delete

Give yourself a high five if you got the movie reference to the remaining employee!

Soft Delete

Typically if you’re using a relational database, you might have foreign key constraints that prevent you from deleting rows to enforce data integrity. You don’t want to have a reference to EmployeID123 in another table when it no longer exists.

Soft deletes solve this issue by often adding a column to indicate if the record/row is “deleted” or inactive.

For example, the table has an IsDeleted column deleted to False.

Soft Delete Table

Instead of hard deleting, records are updated to have IsDeleted set to true.

Similarly, a property indicates if the document/object is deleted if you were working with a document database.

Soft Delete Document

Business Concepts

With a line of business and enterprise-style system, when working in the core part of a system, I’m often not thinking about deleting or soft deleting anything. This is because there are business concepts and business processes that don’t involve deleting data.

People within the domain don’t generally think about “deleting” data, nor would they use the term “delete” when talking about a business process or workflow.

Jumping back to my earlier example of employees, let’s say we were working within an HR system. An employee is a key aspect of the system, and there are different workflows around them. One of those concepts would be their employment history.

When an employee was terminated or quit, we wouldn’t just do a soft delete and mark them as “IsDeleted”. That doesn’t make sense. Employees would have a lifecycle of being hired and their employment terminated.

Events

When an employee’s employment is terminated, other data is likely relevant to that event. When was it terminated, when was it effective, and what was its reason?

The key is that this is a business event that occurred, and we want to capture all the relevant data with that event.

Document Events

Maybe we represent this in our document/object as a collection of hiring and terminations with the relevant data. It’s possible that an employee is re-hired at a later date and has multiple periods of employment.

Again the key is focusing on the business events that occur. If you think about events as a primary driver of your system, you’ll likely land into thinking about persisting events as a state. This is Event Sourcing. If you’re unfamiliar with the concept of Event Sourcing, check out my post Event Sourcing Example & Explained in plain English.

CRUD

Now I can hear people screaming already; it’s just CRUD! While this can be true on the outer edge of a system that is more in a supporting role. In a large system, you’ll have various boundaries that act purely in supporting roles and are mostly reference data. And yes, this can be CRUD and would likely just soft delete.

However, at the core of your domain, as mentioned, you won’t hear people that understand the domain talk about “deleting.” Almost all events have some type of compensating action that ends the life cycle of a process or “undoes’s” or voids a previous action. Capturing those business events and concepts is key to building workflow within your system. If you’re focusing on CRUD, all business processes, workflows and understanding live entirely in the end-users head. Your system, at that point, is nothing more than a UI to a database with no real capabilities.

GDPR

I can also hear people screaming, but… GDPR! I have to delete the data!

You’re missing the point.

If you need to delete data, delete it. The point isn’t about not deleting data; the point is that if you’re “soft deleting” data, you’re losing information about business concepts/events that have likely occurred as part of a business process or workflow. The events that occurred, and why they occurred can be incredibly valuable in building a robust system that can evolve as requirements change.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Data Access Layer makes it easier to change your Database?

One primary reason for a data access layer or abstraction is your ability to change underlying databases easier. Have you ever replaced the underlying database of a large system or service? For example, moved from one relational database like PostgreSQL to MySQL. Or perhaps went from a relational database to a document or event store? There seem to be two groups of people. Those that have will say that abstracting the underlying database is crucial.

In contrast, the other group has never moved the database and questions abstracting or creating a data access layer because you likely won’t replace the database. Like many things in software architecture, it’s about coupling. If you limit coupling, this isn’t much of a hot topic.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Free for All

Let’s assume you have a large system or service with a lot of functionality that interacts with a single large database schema.

Large System

A block represents a piece of functionality. They all have direct access to the entire database schema. It’s a free for all. Any piece of functionality can access anything in the database. If the database was a relational database, this means all the tables. If you were using a document store, this could be all the collections. If you were using an Event Store, this is all the different event streams. Regardless of the database type, it’s free for all data access without data abstraction.

Does this sound like a nightmare? It would be. Because if you did need to change your underlying database technology, you’d have to go to every place that is doing data access and potentially rewrite it completely or modify it to work with the new database. This is much more involved if you change the database from relational to non-relational.

What’s most people’s answer to this problem? Abstraction. Create some layer of abstraction for your database to provide all the database access logic.

This could be a data access layer, using the repository pattern, or maybe just an ORM. Your application code and features then rely on this abstraction rather than coupling with the database directly.

Data Access Layer

So now, if we need to change the database, we’re rewriting the data access abstraction that we created, and all of our application code, in theory, should stay the same.

Coupling & Cohesion

While adding the abstraction can be helpful, I don’t think that’s the right way to look at the problem. The root of the problem is a coupling. There is a high degree of application code that relies on the database.

Large System

While this is true, not all features within the application require access to the same data within the database.

If we’re talking about a relational database, not everything in our system needs access to all the tables. Certain pieces of functionality require access to a certain set of tables.

To reduce coupling, we need to look at cohesion. What functionality in our system requires what data? We should then group this functionality.

Organized by Features

Start segregating functionality by related features working on the same subset of data. Again, not all features need to access the same large schema/data. Define boundaries over data ownership.

Once you start defining boundaries and grouping functionality, you’ll soon realize that each grouping might be able to define how they handle data access. This means that for some sets of features, you decide to use a specific ORM, while for others, you do direct data access without much abstraction.

Data Access Layer by Feature Sets

You’ve reduced coupling by increasing cohesion. This allows you to decide how you want to perform data access per set of features.

There is no more free-for-all of data access. Each set of features only access parts of the database it owns. One set of features should not access the data owned by another set of features.

You might start to see now that each feature set can also decide how data is persisted in which type of database. Maybe some feature sets use a relational database, while others use an event store. You can make these localized decisions per feature set (boundary)

Data Access Layers by Feature Sets

Since the entire application/system is now split and grouped into boundaries defined by functionality and data ownership, you don’t have free-for-all of data access. Often you do need to query other boundaries to get reference data. In the free-for-all scenario, you simply query the database to get that data. However, now you must expose an API that other boundaries can consume, like a contract.

This can be an interface, delegate, function, etc., in which you’re coupling between boundaries. However, the data returned from this API isn’t mutable. It’s purely used as reference data. All state changes to any data must occur within the boundary that owns the data.

Data Access Layer

This has nothing to do with creating a data access layer or abstracting data access. It has everything to do with coupling and cohesion. If you limit coupling and prevent integration at the database (as in data access free-for-all), then needing to change the underlying database is a matter of changing a narrow set of features within a boundary.

Will you change your underlying database? It doesn’t matter. A high degree of coupling will make any change difficult in a large system. Defining boundaries by functional cohesion and limiting coupling will allow a system to evolve.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Fintech Mindset to Software Design

If you’re creating a line of business or enterprise-type software, I think one of the most valuable skills you can have isn’t technical. Rather it’s understanding how the business domain you are in works. One way is following how money flows through a system by having a fintech mindset.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Revenue & Cost

I was on the Azure Devops Podcast, where I mentioned that a big influence on my career was working with an Accountant. No surprise, this has affected how I look at a line of business and enterprise-type systems, how they can be decomposed, and how money flows through the system.

At a high level, I’m thinking about Revenue and Cost.

For context, I’m going to talk about project-based work to illustrate.

How does the company generate revenue? How does the software system we are building fit into how the company generates revenue?

How does the company incur costs to generate that revenue? How do we keep track of that in the system, similar to how we generate revenue.

We generally have customers, sales, and invoicing on the revenue side.

Revenue Side

There is some type of sales process involved with a CRM that ultimately leads us to a sale and invoice a customer to get paid.

On the cost side, we may have employees with a salary or work per hour. We pay them at some interval (weekly, bi-weekly, monthly) based on their salary or hours worked (timesheet).

Employee Cost Side

Other costs can occur, such as materials or services from outside vendors. We will create purchase orders and get receipts.

Vendor Cost Side

Follow the Money

When thinking about decomposing a system and understanding a domain, I help to follow the money. This can lead to understanding the workflows and processes that generate revenue and cost.

In the case of project base work, the heart is in the execution. The actual project. There are many ways to “sell” a project, such as Time & Materials, Fixed Price, etc. The same goes for costs, some employees might be Salary (fixed) some may be hourly.

Regardless, the point is the execution of the project is where the complexity lies. This is likely to be the heart of the system you’re building.

Execution

For example, if you’ve worked on a software project before or done any project management, you can relate that this is the heart of complexity.

So if you were writing a system to manage the full lifecycle of project-based work, the execution of the project would be the core. Many boundaries would act purely in a supporting role, such as CRM, Employee, Payroll, Vendors, and Invoicing. You don’t necessarily need to write your own; this is usually pretty generic, and you can buy it off the shelf or software as a service.

And this makes sense because you’re not about to write a CRM or Accounting system. You can integrate with those to push/pull the relevant data required.

On the other hand, Timesheets, Sales, and Purchasing may be something you choose to write because there isn’t anything specific enough for the context/niche of what you’re building.

Core and Supporting Boundaries

Fintech Mindset

I’ve written software for Distribution, Accounting, Manufacturing, and various forms of Transportation, and they have all of this in common. Have a fintech mindset by understanding revenue and costs and how each is made.

Here’s a screenshot of QuickBooks, an accounting system typically used for small businesses. It gives a good example of how everything is related and how money flows.

Quickbooks

In writing line of business and enterprise type systems, I’ve always found the best developers are the ones that have equally great technical development ability as well as domain knowledge. Having a fintech mindset and understanding how money flows helps.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design