Have you ever had two reports of “the same” data land on your desk and wonder why the bottom line is different? If you don’t trust the data in your company, whether in a report, a database, or an email, you are not alone. In fact, the problem is so pervasive that companies waste millions (insert exorbitant dollar figure here) building redundant islands of data in order to access and control data they trust.
As a society we are comfortable finding and relying upon information provided by others without explicitly knowing the origin of or without directly controlling that data; think how many times you Google something and rely upon that information. So, why don’t we trust the data within our own companies? Myriad reasons come to mind and the problems can certainly be many fold, first and foremost are differences in the definition of a piece of seemingly identical data, especially in large organizations with multiple, disparate systems that generate and process data.
Don’t despair. There is hope. And you don’t have to be technically savvy to save your budget dollars, help contain the problem, maintain control over the data, and (bottom line) trust readily available data.
Before embarking down a path to build your own data island, first understand the definition of the available data. While there are a few lucky ones with an easily accessible repository of data definitions that you can quickly reference (like Wikipedia for your company data), the rest of us need to ask a few key questions, because you won’t get the complete answer if you just ask for “the definition”.
Here are the important points to review with your technical partners to get the whole definition and business relevant context and to maintain control and trust in the data.
Data Definition and Context (also known as metadata):
- Understand whether you are getting data from the transactional source system directly or from a database that takes snapshots of the transactional source system data and stores it.
- Find the source of the data. Know the system name, the screen name, and the exact field on the screen that produces this data.
- Understand the source of the source. Example; human data entry (free text, multiple choice), automated entry (date/time stamp), calculated system entry (sales tax), or system reference entry (zip code = city and state)
- Know the possible, valid, or system allowed values Example; Yes/No, Married/Single/Other, Alphanumeric 16 characters.
- Ask about any transformation rules. If the data from your source moves was it changed or filtered in anyway. Sometimes these can be rule based, for example customers with California zip code in the primary mailing address are assigned to the Pacific Sales Region. Other times these can be converted to make database storage more manageable. Example; Yes/No was converted to Y/N, a check box was converted to Y/N, or First Name, Middle Name, Last Name was combined into Name.
- If there are transformation rules, find the administrator of the rule. You’d be surprised (or maybe you wouldn’t) to find that many times these rules are created by the programmers or analysts that move or manipulate the data and there is no formal change management or notification. To control the data, you need to control these rules so find the person or team who implements these rules and enforce your change management and change notification criteria.
- Understand the update frequency of the data (how often is a snapshot of the data taken and stored)
- Persistence (part 1) Understand the historical transaction record. Does more recent data overwrite older data or is every snap shot stored?
- Persistence (part 2). How long is the data stored? What happens when it reaches “end of life”? Is it deleted permanently or stored offsite? For how long?
It is critical to understand the context and definition used to describe data, and unfortunately terms like metadata are thrown around without a business based understanding of how it can be relevant to sales, marketing, or operations. Asking these questions about your data can help you take control of your data destiny while maintain shareable data the greater organization can leverage as well. Resist the temptation to build a data island and know that the grass is not always greener on the other side (or on your own island).