Enquire Now
DQ Blog Data Quality What has Chaotic Data Quality got in common with Entropy?

What has Chaotic Data Quality got in common with Entropy?

Martin Doyle January 31st, 2012 Data Quality

Firstly, I promise you won’t need to be a scientist or engineer to understand this. And yes, it is relevant to how data decays from order (high quality) to chaos (low quality).

There is a ubiquitous phenomenon we all instinctively accept that data has an unerring ability to go from high quality to low quality.

This phenomenon – with energy – is defined by the 2nd Law of thermodynamics; which loosely states that energy has an absolute and unfailing tendency to go from “more concentrated” to “less concentrated”. It kind of “spreads out” and gets “diluted”. Some examples are:

  • Energy flows from a higher temperature to a lower temperature (heat exchange)
  • Energy flows from a higher pressure to a lower pressure (expansion).
  • Energy flows from a higher voltage potential to a lower voltage potential (electric current).
  • Energy flows from a higher gravitational potential to a lower gravitational potential (falling objects).
  • Water flows and falls from higher elevation to a lower elevation (downhill).

Basically, energy always goes from high concentrations to low concentrations and when the transfer stops there is a state of equilibrium, when it is said to be at its maximum entropy.

In science, “Entropy” is defined as a measure of unusable energy. As usable energy decreases and unusable energy increases, “entropy” increases. So, as usable energy is irretrievably lost, disorganization, randomness and chaos increase.

In the context of this article, it sort of validates why our, once orderly databases – if left to their own devices – rapidly decay into a disorderly, untrusted, fragmented and duplicated mess.

Entropy may therefore be thought of as a measure of the usefulness of data or information. Eventually all of the data in our organizations just gets less useful; until finally, it becomes mostly useless. It has reached a point of equilibrium, or its maximum entropy, where it has no further potential to be actively used, for say marketing, or, for informed decision making.

Sounds like what happens to any database when neglected and left to decay naturally to me?

Unlike energy though, unfortunately, as yet, we cannot scientifically measure the degree of data entropy as I don’t believe there are any universally accepted units of data chaos or disorder. It does sound like a good legal term though for disciplinary action… “You are guilty of generating 3.5 units of disorder in my CRM and 4.2 units in my ERP system, you are sentenced to x years of data entry”.

So what can we learn from this?

Well if we borrow from science and again stretch the energy metaphors to apply to data and information, it seems pretty obvious that if we wish to reverse data chaos and overcome data decay, we need to apply some effort and actually do some work!

In science, work is defined as (force x distance moved) e.g. the work or effort required to lift a weight, compress a gas, pump water uphill etc., or, in the case of data, we might consider it the work or effort required to change its state from “A RIGHT STATE”, to “THE RIGHT STATE”.

Basically, if we are to change the state of data within business applications into a state which is fit for use, there is hard work to be done! There can be no more excuses or corporate slacking; because, when it comes to: refreshing, standardizing, formatting, validating, suppressing, deduping and enhancing your data. All of which are incidentally verbs, action is the key.

Data does not clean itself

Unless you take action, things simply stop happening, or don’t start, when there is equilibrium or maximum entropy. Putting data back into a fit for use state requires work, hard work.

It will be worth the input of physical and emotional energy though as businesses will be rewarded with high value data yielding high value returns. Basically, things happen when high energy high value data is allowed to move from high potential to low potential through its use.

Action is always the key.

In the case of corporate data, it requires effort from everyone:

  • Business Leaders need to lead a culture of Corporate Data Responsibility (CDR), where trusted data is the norm and accurate information a corporate imperative.
  • Management to implement CDR through a data governance culture where data is skilfully curated to deliver business information and organisational insight.
  • I.T. to ensure CDR where any data migrations, data integrations and data processing take place to guarantee they are co-ordinated, repeatable and correct all of the time.
  • Data workers to ensure CDR through data which are captured correctly, first time and every time so it is fit for use by all upstream consumers in the data and information demand chain.

All of this combined effort means better business; reduced entropic waste, reduced operational friction, reduced data scrap and re-work. It leads to: actionable information, which in turn drives better decisions, which creates, greater shareholder value, greater sustainability, happier employees and much, much higher profits!


Written by Martin Doyle

Martin is CEO and founder of DQ Global, a Data Quality Software company based in the UK. With an engineering background, Martin previously ran a CRM Software business. He has gained a wealth of knowledge and experience over the years and has established himself as a Data Quality Improvement Evangelist and an industry expert.