#data #governance #tools
Data Governance and Quality: Data Reuse vs. Data Repurposing
I have been assembling a slide deck for an upcoming TDWI web seminar on Strategic Planning and the World of Big Data, and I am finding that I might sometimes use two different terms ( data reuse and data repurposing, in case you ignored the tootle of this post) interchangeably when in fact those two words could have slightly different meanings or intents. So should I be cavalier and use them as synonyms?
When I thought about it, I did see some clarity in differentiating the definitions:
- data reuse means taking a data asset and using more than once for the same purpose.
- data repurposing means taking a data asset previously used for one (or more) specific purpose(s) and using that data set four a completely different purpose.
For example, if we have an application that uses the customer database to generate address labels for a marketing campaign for a mailing this morning and then later in the day we use the same customer database to generate address labels for a second marketing campaign for the afternoon mail pickup, I would call that reuse. On the other hand, taking that same customer data set and combining it with sales transactions from the last month to classify customers by transaction and sales volume as part of an overall profiling algorithm would be an example of taking the same data but using that data for a different purpose.
The question build down to the governance aspects of assessing data quality requirements. For multiple instances of reuse, are all the quality expectations going to be identical? Alternatively, when a data set is repurposed, whose responsibility is it to document data quality rules and acceptability thresholds as well as integrate validation of the data into upstream processes?
And even more of an issue: what does one do if the repurposing is very far from origination? If we grab a data set from a public web site that has been through a number of transformations, the information in the data set may be subject to very different interpretations than when the data instances in the sources were originally created. That makes the problem woven more difficult are we allowed to modify (AKA correct ) data values that don t meet our needs? Or are we constrained to use the data set as is because corrections alter the data, potentially affecting its repurposability (I think that is a new word I just invented).
In either case, providing a definition for both terms distinguishes the usage scenarios, and at the very least allows me to use both terms in the same blog entry or presentation slide.
3 Comments on Data Governance and Quality: Data Reuse vs. Data Repurposing
Good musings David. Makes me pose the question: Is data of high quality if they are “fit for purpose of use” or “fit for repurposing”?
[ ] by a blog post by David Loshin called Data Governance and Quality: Data Reuse vs. Data Repurposing I was, perhaps a bit off topic, inspired to pose the question about, if data are of high quality if [ ]
Great topic, David, especially in regards to how you are defining data re-purposing. I have often used the Fit-For-Use analysis method from the Data Governance Institute. It really helps unravel the complex mix of challenges raised when multiple downstream consumers re-purpose data from a common source. Too often this occurs as a sort of secret second life of data that only comes to light when things go wrong. What works for reuse may be entirely different from what works for re-purposing. And the greatest challenge of all is that no one perspective is more correct than another. Keeps things VERY interesting. But one way to stay in front is to understand the need to be prepared to support Fit-For-Use early on.
Tell me what you’re thinking.
and oh, if you want a pic to show with your comment, go get a gravatar !