The Ethics and Benefits of Record Matching
Everything we do generates data. When you found this blog, your search query was recorded; your click from social media saved to a log file. Your location may be saved alongside it.
Each one of us is connected to the internet via a machine with a unique MAC address code and a non-unique, but traceable, IP address; we collaborate using specific credentials like email addresses, and the times and dates of communications are relatively easy to compile.
Businesses are increasingly looking towards big data for answers to their big questions. By drawing on massive amounts of data, leaders hope that they will unlock the data secrets and turn it into information about consumer behavior and reveal more opportunities to profit.
Of course, this is not a foolproof process. But the rewards are potentially massive. McKinsey says that the US healthcare industry could save $300bn a year, that’s 1,000 per American per year by making better use of big data. Companies like Google have tracked the spread of disease by collating search results for cures.
According to the UK Authority website, data matching is going to be used to tackle fraud.
What is Record Matching?
Record matching is a way to find duplicate records in the same database, or cross matches in other databases. This allows us to join databases together, or allow databases to connect to each other, increasing the amount of automation we can use at work. It also lets us see the bigger picture.
Using record matching, you can do some very useful things with business databases:
- Deterministic record matching helps us identify a record using a unique string of characters. For example, your account number relates to one account at the bank, and HMRC often finds your information using your UTR or national insurance number. This is a simple data match – a one-way search.
- Probabilistic record matching helps us to find non-identical duplicate records by matching a common pieces of data in both, albeit they may not be exactly the same. Any piece of information can be used as long as it is present in both records, but we may need to check secondary information to ensure that we have truly found a duplicate and not a false positive match.
- Record matching can be used to cleanse redundant duplicates. When we match multiple records, we can merge them to improve data quality and create a single record view. This is the main function of deduplication software, such as that produced by DQ Global.
Record matching has various names, depending on who is talking about it. Some people call it record linkage or data linkage.
Why is Record Matching Being Used?
In the cases mentioned in the introduction, record matching is controversial because the data used relates to private individuals. Naturally, we are nervous about our information being used for purposes we have not agreed to. In the context of our examples above, the data is however anonymized or used to fight crime, hence many perceive the outcomes to be worth the risk.
However, this controversy doesn’t detract from record matching as a tool. When it comes to business use, it’s incredibly efficient and useful:
- It reduces waste.
- It connects systems together.
- It makes databases purer and leaner.
- It frees staff from routine administration.
- It allows limited budgets to be spent more effectively.
Record matching and linking is also one of the key techniques used in creating the single customer view. If data quality is high, the business can reliably match customers across multiple databases and create a single, all-encompassing dataset that links every aspect of that customer’s data across the lifetime of their business relationship.
How Data Quality Relates to Matching
Record matching is a process that needs high levels of confidence to be effective. While a match can take place with low quality data, the software involved in making the match needs to be highly sophisticated to overcome low data quality challenges.
Clearly, if two records use non-standard, inconsistent data or has a large number of errors, those records cannot be effectively matched without some level of intervention. These may include:
- Using algorithms to intelligently identify matches from data that does not exactly match. For example, the software may find a phonetic match rather than a literal match.
- Fuzzy (probabilistic) matching, which is essentially a match that is not exact but is highly probable as a match.
- Elegantly dealing with missing data and finding alternative ways to resolve the match, which may include finding secondary matches in the record.
- Involving a human reviewer, and therefore suppressing records which have been previously rejected so they are not continually presented as potential matches.
Businesses can also process records using data quality software before attempting record matching. This pre-processing makes the data fit for purpose by eliminating many of the challenges mentioned above.
In practice, major record matching projects use some degree of intelligent ‘near-matching’ and some degree of data quality improvement as part of a holistic commitment to improved data quality.
DQ Global’s Matching Technology
DQ Global has developed DQ Match™, record deduplication and matching software. It is designed to plug into your CRM system or database directly, so there is no need to import and export. Once installed, it scans the database across an unlimited number of fields to identify matches and deduplicate records.
DQ Match™ will optionally delete and/or merge records it has identified as matches. If the match threshold is not entirely reliable, a human must then process it. That person interacts via a single touchpoint, using a drag and drop interface to choose which records can be safely deleted and merged.
While big data is a relatively new concept, DQ Match™ is the result of 20 years’ experience in data processing techniques. In an interconnected world where data links us all, it is more relevant than ever.