The Golden Record: Explained

Where ‘big data’ appears to be the skeleton key that will unlock everything and all you want to know about your business, there’s more than meets the eye when it comes to understanding your data. Yes, clean data will unlock incredible value for your enterprise; inaccurate records, on the other hand, are a significant burden on our productivity.

This is why we all seek the “Golden Record”.

The Golden Record is the ultimate prize in the data world. A fundamental concept within Master Data Management (MDM) defined as the single source of truth; one data point that captures all the necessary information we need to know about a member, a resource, or an item in our catalogue – assumed to be 100% accurate.

Its power is undeniable. However, where we have multiple databases, working out how to achieve such perfection is hard to ascertain. As such, we must first understand the benefits of a golden record.

So, let’s step back to your childhood and consider how imperfect information can cause havoc in any system.

The Power of the Golden Record

To explain, we are going to go back to a very simple example – You’re 13-years old. Imagine sitting in your classroom. Everyone has arrived, and the teacher is about to run through the register.

They have drawn the names from various local databases and put them on paper without properly checking who is meant to be where.

After a few minutes, something isn’t right.

The teacher is repeating themselves. No one is sure why. The process is taking much longer than it should and the kids are becoming restless. We take a closer look at the register, then everything becomes clear.

Seemingly duplicated records – the bane of the Master Data Management world.

Name

Age Home Phone Post Code Gender

Fred S

13

374999

TR0 0RT

F

Frederick Smith 13 01274 374999

F

When building databases from disparate sources, we often run into the issue of duplication. Whether resulting from incomplete entries, changes that occur over time or some other reason, this is a significant issue for any enterprise that relies on vast volumes of information.

As you may imagine, if we were to expand the rollcall example to include hundreds of thousands of names, the overhead of duplication becomes exponentially worse, with every process draining an increasing volume of resource. If we manage to compile a single entry, however – the “Golden Record” – every process becomes infinitely more efficient, and we can begin to leverage the data at our fingertips.

Building the Golden Record

The complexity of implementing a Master Data Management solution stems from defining the workflow that will connect our disparate data sets.

First, we have to identify every data source that feeds into the dataset. Then, we must consider which fields we find to be the most reliable depending on their source. Finally, we must define the criteria that will determine when the data from one source should overwrite conflicting data from a secondary source in our MDM system.

Back to the rollcall example.

We can see that Frederick’s name has two different entries, so which field do we choose to prioritise (bearing in mind this will apply to every entry within the system, not just to Frederick’s)? To answer this, we must determine which system has the correct name most often, or review if any other system captures that field more effectively.

In this instance, it appears to be the second row, in which case we would apply a rule that states we take the ‘Name’ from this source to build our golden record.

Merge and Match Records

The critical question you will face in any MDM solution is how to merge and match apparently duplicate records.

With Frederick, there are two seemingly similar entries, so what is our process to create the single golden record? There is crossover, however, specific differences mean this is not an automatic case to match and merge.

Name

D.O.B Home Phone Post Code Gender Child #

Fred S

01.05

374999

TR0 0RT

F

3

Frederick Smith 13.02.05 01274 374999 F

4

In such instances, the system must review the source of each field. If the first source is deemed more reliable for the postcode, whereas the second field is more reliable for the name and phone number, then define rules that specify the system to follow this approach.

Most MDM solutions offer effective merge functionality. So, you could define the above criteria for the system to review records and, where necessary, carry out the appropriate merge process.

Manual Intervention

Inevitably, problems still arise with data quality. Particularly if we’re lacking a reliable system for specific records; date of birth, for example. In these cases, having a workflow manager toolkit can help.

The toolkit will assign inconsistent records to a data steward for human review, so they can either follow up discrepancies or use past experiences to inform their decision.

Further rules can be put in place to manage the final merge of the revised information, meaning we preserve the overall integrity of our Golden Records.

For more information on managing your data, call us on 023 9298 8303 or email us at info@dqglobal.com.

Written by

Martin is CEO and founder of DQ Global, a Data Quality Software company based in the UK. With an engineering background, Martin previously ran a CRM Software business. He has gained a wealth of knowledge and experience over the years and has established himself as a Data Quality Improvement Evangelist and an industry expert.
Connect with Martin Doyle -

Close

Search DQ Global

Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages

Can't find what you need?

Get in Touch