Estimating The Monetary Value of Data Modeling in your Organization

(Data Modeling is Dead! Long Live Data Modeling!)

Marcin Kulakowski
6 min readMar 5, 2024

Introduction

In the ever-evolving landscape of information technology, data has become the lifeblood of organizations. Efficiently managing and leveraging data is critical for making informed decisions and ensuring the smooth functioning of every system, even in shaping the development and effectiveness of generative AI models. One powerful technique that plays a pivotal role in this process is data modeling. In this blog post, we will explore the significance of data modeling, its benefits, and the consequences of neglecting it. Additionally, we’ll delve into the monetary value of data models and how much it can save money for your organization.

Why is Data Modeling So Important?

Data modeling serves as the foundation for effective database design for any application and system. It involves defining and organizing business concepts, rules, entities and data elements to understand how they relate to each other. A well-crafted data model provides a clear blueprint for data, ensuring that data is accurate, consistent, and easily accessible.

Benefits of Data Modeling

From improved decision-making to enhanced communication between stakeholders, a well-designed data model can bring about a positive transformation in various aspects of business operations.

  • It enables business requirements to be accurately and completely captured and ensures that the delivered database meets business requirements.
  • Facilities re-use on subsequent projects which accelerates speed to market.
  • Decreases application and system maintenance time and costs which prevents correction of errors associated with missing or misunderstood requirements.
  • Protects data integrity — physical data model implements necessary referential integrity constraints.
  • Increases value of data assets — data assets are understood and thus more effectively utilized.

Consequences of Neglecting Data Modeling

Neglecting data modeling can lead to a host of issues, including data inconsistencies, poor system performance, and increased development and maintenance costs.

  • Missed data requirements: tables and files designed based on screen and report layouts or workflow, which results in missing requirements that must be fixed after implementation.
  • Unstable database design that leads to be constantly revised as requirements discovered during development.
  • Misunderstood data assets: no definitions, no examples, cardinality etc. etc.
  • Compromised data integrity: without a normalized logical data model it is impossible to determine whether or not data integrity is being protected.
  • High maintenance: incorrect or inaccurate capture of initial requirements result in significant error correction, as well as inflexibility for adaptation to new requirements and enhancements.

A lot of organizations don’t see any value in doing data models. They think they just need a database that it will just work out of the box and they say they can’t spare the time and money for a nice picture and documentation. On the other hand, if they don’t consider it, data applications and systems are going to be more expensive to maintain. That’s a value “perception gap” that every organization needs to look at closely. I think the best way to close that gap is to make a convincing case that data modeling delivers significant monetary value.

What is the Real Monetary Value of Data Models?

Investigate the tangible and intangible value that data models bring to organizations. From saving development time to reducing errors, understanding the monetary impact of data models helps in making informed decisions about resource allocation and project prioritization.

One could agree that the monetary value is in the number of times portions of a data model are referenced, or perhaps in the number of entities that are reused in subsequent projects which would mean a measure of quality of the original analysis as it is a measure of amount of reuse. Cost of that measure can be calculated based on a “days per entity” number. Total savings (and related cost savings) would be equal to the “days per entity” multiplied by the number of entities reused. But also, assuming we were able to reuse an existing database for a second data application, the time savings could simply be “days per entity” multiple by the number of tables in the existing database.

The real big money is here: Reducing the cost of system maintenance.

How?

By accurately capturing data requirements during the design phase, thus reducing time and cost associated with error correction and enhancements during the maintenance phase of the system life cycle.

Reducing the cost of System Maintenance

Hypothesis: The expense of data modeling incurred during the development phase of the system life cycle is more than by savings realized during the maintenance phase as compared to the identical project in which data modeling is omitted.

Corollary: The magnitude of savings over expense constitutes the measure of the data model monetary value. The larger the difference, the greater the monetary value.

Validating our Hypothesis

Ideally, conduct a controlled experiment in which a specific system or an application is developed twice, by the same people, once without the data model and cost metrics captured as these duplicate systems are utilized (somehow captured as these duplicate systems are utilized somehow identically) through the entire maintenance phase until system obsolescence. There are some difficulties associated with this. First, it’s costly, the cost of the system is doubled, second, it’s time-consuming, since the average enterprise system is about 8 years, you would have to wait a total of 16 years to obtain the results to compare.

Realistically, the hypothesis can be reasonably validated by utilizing metrics from decades of software engineering studies and projecting benefits based on those metrics.

Data Modeling Impact on Maintenance

Enhancements: Easier to implement because model meta data facilitates understanding, normalization eliminates data redundancy, therefore design is flexible and readily extended. Enhancements are expected — the key is to make them efficient. Data modeling as a moderate potential for cost reduction.

Adaptation: Model metadata facilities understanding necessary to map data element data types from one database to another. Adaptation is inevitable as technology changes. Data modeling has a low potential for cost reduction.

Correction: The number of errors and consequent re-work is significantly reduced with data model, since business requirements are accurately captured at the beginning of the project (prior to coding). Correction is a direct reflection on quality. This cost must be eliminated. Data modeling has a HIGH potential for cost reduction.

Error Correction

Studies on frequent error occurrence on software projects has shown that up to 28% of those errors were related to incomplete or erroneous specifications, directly attributable to inadequate requirements identification.

Software engineering studies indicated that cost errors are failures in the design and not in coding. The later a defect is found, the more expensive it is to correct it. Most errors can be traced back to errors in the initial requirements. One study relieved that 56% of bugs and 82% of correction effort resulted from improper requirements.

Coding errors are relatively inexpensive to correct. Design errors are much more expensive, but requirements errors are the most expensive to repair because of the redesign which is usually involved. It has been estimated that the cost of fixing the requirements errors can be up to 10 times the cost of fixing a simple programming error.

So what percentage reduction in system maintenance cost is data modeling capable of achieving? 1%? 5%? 10%?

I challenge you to use 3–5 actual funded projects at your organization. Determine next present value savings, payback in years and return on investment percentage for each scenario. Then assume the system life cycle is about 8 years, the life cycle is 25% development and 75% maintenance. Data Modeling costs incurred at a constant rate. Maintenance reduction savings realized at a constant rate. NPV savings at 5%, which is roughly a 10 year treasury bill.

Conclusions

Most organizations don’t realize this, but the monetary value of data modeling is much greater than most people think. If you’re doing data modeling in your organization the correct way, not only it provides a structured representation of the data or guiding feature engineering or enables developers to create more accurate, reliable, and interpretable AI models that better understand and generate data in specific domains, but most importantly, it’s reducing the system maintenance costs right now. If you plug in those numbers from above for your projects and see how much saving it would generate… Even if it was only 1% , you will see that you could save millions of dollars. Let that sink in.

--

--

Marcin Kulakowski

Don't solve a problem, offer a better solution and show the art of the possible. Currently @ Snowflake.