Enquire Now

The DQ Glossary

The following glossary provides definitions for a range of terms related to data quality, processing, and management. These concepts are important in the field of data management and are essential for anyone working with data professionally.

Term  Definition 
Accuracy  The degree to which data correctly describes the “real world” object or event being described. 

In order to be accurate, data must be free from error and conform to standards, i.e., values must be: 

  • Valid 
  • The right value and in the Correct representation 

In the context of data quality: 

  • Is the data verified against trusted sources? 
  • Is the data authenticated by automated or physical checking processes? 
API  An application programming interface (API) is a set of defined rules and protocols

that explain how applications talk to one another. 

Batch or Bulk Processing  The execution of high-volume, repetitive data processing jobs which can run without manual intervention, and are typically scheduled to run as resources permit. 

A batch process has a beginning and an end. 

Completeness  The proportion of stored data against the potential of “100% complete”. 

In the context of data quality defects:

  • Are all values present? 
  • Are Null and Empty values present? 
  • Are all mandatory fields populated? 
Congruence  The absence of difference when comparing two or more representations of a thing against a definition. 

In the context of data quality defects: 

  • Are the values consistent with external lookups or picklist values? 
  • Are the values consistent across business applications or systems? 
  • Are the same values of the same data type i.e., Text, Date, Integer etc. for ease of comparison and analysis? 
Consistent  The absence of difference when comparing two or more representations of a thing against a definition.
In the context of data quality:

  • Are the values consistent with external lookups or picklist values? 
  • Are the values consistent across business applications or systems? 
  • Are the same values of the same data type i.e., Text, Date, Integer etc. for ease of comparison and analysis? 
Data Authentication  Authentication is the process of determining whether someone or something is, in fact, who or what it says it is. 

In the context of data quality automated checks are performed in real time or batch, to ensure the thing being checked is real at the time of checking. 

Common use cases include silent phone line testing to see if number will dial, or a check to see if an email will deliver. 

Data Assessment, Data Audit or Data Profile  An Assessment, of data quality is like an MRI scan for data, which will often uncover a lot of hidden facts and insights from every data field about the values stored: 

  • The Most to Least frequent occurrence of data field values, including missing values 
  • The Most to Least frequent occurrence of data field patterns 
  • The unique counts of data field values, words and characters including: Non-printing, AASCII, Unicode, Scripts etc. 
  • The longest to shortest data field values, their ranges and averages, 
  • The types of data stored: i.e., text, numeric, date, memo, Boolean etc. 
  • The Precision of numeric values 
‍Data Cleansing  The process of preparing data for analysis by amending or removing:

incorrect, corrupted, improperly formatted, duplicated, irrelevant, or incomplete data within a dataset. 

Data Congruence  The process of comparing elements of a record to assess its congruence. 

In the context of data quality, this might involve the analysis of two or more data field values to ensure they are congruent.  e.g., a dialling code relates to a country, or whether a first or last name appear in an email local part. 

Data Deduplication  The process of eliminating duplicate copies of repeating data. 

This can be achieved by implementing deterministic or probabilistic matching techniques to accurately identify duplicates. 

In the context of data quality, this often relates to deduplication of Organisation’s (Accounts or Companies), People (Employees or Contacts) or Addresses, (Locations).  

Data Derivation  The process of obtaining a new piece of data from another by analysing its structure, pattern, or values. 

In the context of data quality, deriving a countries ISO code from a telephone number prefix, or the country might me derived from an email’s domain suffix. 

Data Enhancement  An increase or improvement in quality, value, or extent. 

In the context of data quality, enhancement might be as simple as changing and correcting the casing of a data field value such a

  • Lower casing an email e.g., Support@DQGlobal.com to support@dqglobal.com, or… 
  • Title casing Family name.  o’gorman to O’Gorman. 
Data Enrichment  Enrichment is the process of adding data field values which improve the quality of data by adding or correcting something that was incorrect or missing. 

This can include the process of enhancing, appending, refining, and improving collected data with relevant third-party data. 

In the context of data quality, the appending of: 

  • B2B attributes including: SIC, turnover, employees, revenues, etc. 
  • B2C demographics including: age, sex, education, nationality, ethnicity, or religion etc. 
Data Formatting  The process of transforming data to be in a correct/specific format. 

This can include transforming phone numbers, email addresses, URL’s etc. to an agreed format. 

In the context of data quality: 

  • Casing of values – Upper, Lower, Title & Proper 
  • Standardisation of values. e.g., Limited to Ltd 
  • Formatting of values e.g., United Kingdom or U.K. to ISO Code GBR 
  • Removal of spaces or characters 
  • Replacement of Characters 
  • Correct formatting of phone numbers and emails 
Data Migration  Involves moving data from one system (the source) to another (the target), i.e.in one direction. 

Migration is often a one-time process. Once data has been migrated, it is not moved back, and the migration is not repeated. 

Data Integration  ‘The meshing’ of two systems that do not already talk to each other. Integration is repeatable, too.

Often, it means creating a two-way link so that users see a more complete picture of a record or contact. Integration is commonly used in cloud applications: for example, your Customer Relationship Management (CRM) system and your accounting tool may be linked so that you can see invoices, contact details and payment history in both. 

Data Parsing  This involves manipulating data by splitting it into its constituent parts.

This can include phone numbers and email addresses. 

Data Quality  A measure of how reliable a data set is to serve the specific needs of an organisation based on factors such as:

Accuracy, completeness, consistency and reliability. 

Data Standardisation  The process of creating standards and transforming data taken from different sources

into a consistent format that adheres to those standards. 

Data Suppression  The process of identifying whether people are:

Deceased, gone away, or have expressed a preference not to be contacted. 

Data Transformation  The process of editing data by abbreviating, elaborating, normalising etc.

This can include examples such as abbreviating United Kingdom to UK, elaborating Rd to Road or normalising ‘Johnny & ‘Jonathan’ to ‘John’. 

Data Validation  Refers to syntactically validating values as being of the correct format.

This means ensuring they look correct, for example, an address following the correct format of having a house number and a postcode. 

Data Verification  Checking to see if the values or record match the proxy of the real-world entity they are supposed to represent.

This would be the equivalent of checking address details against the yellow pages to check the address matches the name given. 

Database  An organised collection of structured information, or data, typically stored electronically in a computer system

That can be easily accessed, managed and updated. 

First-Party Data  The data collected directly from your own sources,

commonly concerning your audience or customers. 

Golden Record  Also known as the “Single Customer View (SCV)”, the golden record refers to the consistent and comprehensive view of all the data an organisation has about its customers that is stored and consolidated in one record in a business application.

Organisations may hold multiple records for the same contact in various business applications – these records need to be duplicate free, complete and accurate, which then creates a Golden Record. 

Master Data Management   Master Data Management is the technology-enabled discipline in which a business and IT work together to ensure:

Uniformity, accuracy, consistency and accountability of official shared master data assets. 

Single Customer View (SCV)  Also known as the “Golden Record” is a consistent and comprehensive view of all the data an organisation has about its customers that is stored and consolidated in one record in a business application.

Organisations may hold multiple records for the same contact in various business applications – these records need to be duplicate free, complete and accurate, which then creates a Single Customer View. 

System of record  A system of record or is the term for an information storage system that is the authoritative data source

for a given data element or piece of information. 

Third-Party Data  Data collected that does not have a direct relationship

with the user the data is being collected on. 

Timeliness  The degree to which: (a) data represent reality from the required point in time, and (b) consumers have the data they need at the right time. 

In the context of data quality defects: 

  • Are closed opportunities in the future? 
  • Are dates of birth in the past? 
  • Are maturity dates in the future? 
  • Are date value distributions appropriate? 
  • Are time series events correctly sequenced? 
Uniqueness  No thing will be recorded more than once based upon how that thing is identified”. 

In the context of data quality defects: 

  • Social Security Numbers, or National Insurance Numbers, are never shared 
  • Entities” like: Businesses, Contacts and Locations are not duplicated 
Validity  Data is valid if it conforms to the syntax (format, type, range) of its definition. 

In the context of data quality defects: 

  • Do all expected values have the correct syntax or value ranges 
  • Are the values valid according to current lookups or picklists 

 

Data Types

Data types refer to particular kinds of data item, as defined by the values it can take, the programming language used, or the operations that can be performed on it. 

Type  Definition  Example 
Integer (int)  Numeric data type for whole numbers without fractions  -909, 0, 909 
Floating Point (float)  Numeric data type for numbers with fractions  909.09, 0.9, 909.00 
Character (char)  Single letter, digit, punctuation mark, symbol, or blank space  a, A, 9, !, ? 
String (str or text)  Sequence of characters, digits, or symbols—always treated as text  Hello World, 0044-(0)2392-988303 Ext. 123, Straße, Zoë, Soufflé, myname@mydomain.com 
Boolean (bool)  True or False values  0 (false), 1 (true) 
Enumerated type (enum)   Small set of predefined unique values (elements or enumerators) that can be text-based or numerical  blue (0), black (1), red (2), Green (3) 
Array   List with a number of elements in a specific order—typically of the same type  blue (0), black (1), red (2), green (3) 
Date  Date in the YYYY-MM-DD format (ISO 8601 syntax)  2022-09-15 
Time  Time in the hh:mm:ss format for the time of day, time since an event, or time interval between events  10:00:29 
Datetime  Date and time together in the YYYY-MM-DD hh:mm:ss format  2022-09-15-10:00:29 
Timestamp  Number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970 (Unix time)  1561956700 

 

The DIKW Model  

The commonly known DIKW model refers to the pyramid structure of data, information, knowledge and wisdom, where data acts as the key foundation. From raw data, we obtain information, from this information, we gain knowledge and from this knowledge, we achieve wisdom. Below is an example of how an airplane pilot might interpret Data, Information, Knowledge and Wisdom 

Data  The number 10,000 flashes on your display. No label, no description, no units. It is data, but it means nothing to you. 
Information  If the display reads ‘10,000 feet above sea level’, it is information. 
Knowledge  If we are aware of mountains soaring to 12,000 feet, that’s knowledge. 
Wisdom  Wisdom is to climb another 2000 feet above to be safe. 

As data is the very foundation of our pyramid, it needs to be high quality and clean. The more we enrich our data with meaning and context, the more knowledge and insights we get out of it. Enabling us to make better informed data-based business decisions.