According to Gartner, “Within the next year, the number of data and analytics experts in business units will grow at three times the rate of experts in IT departments, which will force companies to rethink their organizational models and skill sets.” We believe, in short, the demand for usable enterprise data is outstripping supply. To deliver clean, unified and business-ready data at scale, data leaders will need to change the way they operate. Clearly, something has to give.
And it is.
With advances in machine learning, cloud computing and storage, enterprises are finally breaking the data-management logjam. At stake are breakout improvements in business efficiency, revenue realization, product innovation and competitive differentiation. The results driven here could be transformational.
Data Management Meets Business Asset
As a veteran data-management product leader, I’ve been along for the ride since the beginning. I was there when enterprises began to realize that data was a reusable business asset (like software) versus a consumable IT asset (like compute power) — and that IT must treat it as such.
The initial solutions to this problem included enterprise data warehousing, master data management (MDM) and extract-transform-load (ETL). In these centralized mechanisms, a very few skilled IT people created pipelines that sucked raw data out of various siloed systems and put it where it could be most useful to businesspeople. Centralization was dictated by the compute and storage limitations of the time and by the technical skill requirements of various tool sets, not by some universal truth that centralized data management infrastructure was the optimal setup.
Enterprise IT departments were the high priests of making this work. And they did the best that they could. They interviewed business users who deeply understood how to leverage data to extract business value, but they lacked a technical understanding of how the data was structured and managed. IT struggled to codify this business knowledge as software assets, building complex rules-based systems to accommodate a variety of data sources and a variety of data consumption scenarios. But both sides were always changing and growing, creating a continual game of catch-up. Producing the next good set of useful data was nonincrementally expensive and time-consuming.
Islands Of Information: The Return
Over time, IT departments became better equipped for this. New, self-service data preparation tools simplified rules-based systems, and compute power got cheaper and more accessible to end users. Using Tableau and self-service data preparation tools like Alteryx or Trifacta, businesspeople could directly access core enterprise data and codify their knowledge at the spokes, not the hub. They no longer had to wait in line behind big-iron IT to add data to master data stores. They didn’t have to apply a third normal form, or even know what it was, in order to work with the data at the end points. It was not a perfect setup, but a great workaround.
But just like in a bad horror film, a new monster arose — and it came from inside the house.
Instead of one centralized master data source and set of fragile rules to manage it, enterprises now had dozens (often many more) of them. This came with very little governance — three people could be working with the same data source in three different ways, with only one of them doing it correctly.
The spokes at the edge of the business were productive, but not sharing or learning from each other. Meanwhile, the hub had to keep writing common rules, which often broke when new, nonconforming data was entered. Rules-based systems tended to max out at 50 data sources, academic research showed. This created a potentially lethal combination of data variety, volume and velocity.
Data Management In The Machine Learning Era
With continual advancements in hardware — at the cloud and on-premises levels — compute power and storage today are far more economical. This has enabled a new environment in which computationally intensive algorithmic workloads can be deployed broadly, beyond the core, expensive infrastructure previously dedicated to these tasks. By coupling these advancements in machine learning with increasingly abundant related skills, IT can create curated, broadly consumable data that’s continuously fine-tuned with the creativity and short-cycle input from businesspeople.
Business experts’ acumen and deep data understanding become data training input for machine learning models, instead of going up in virtual smoke. The models can work on all the data, while continually learning directly from the people most affected by it. Models are resilient in the face of a lot of conflicting data and contradictory feedback. Because they are constantly looking at common patterns across variegated data sets, models can accommodate data variety at scale. When data disrupts the models, those models can quickly be trained to improve their accuracy, as opposed to the laborious process of retrofitting a complicated set of rules without breaking something.
Unlike MDM and other technology shifts, machine-driven enterprise data mastering doesn’t require a forklift upgrade. It works with what you have (including people) and fits in with modernization initiatives like digital transformation. Those nonscalable enterprise data warehouses, data lakes or master data management systems don’t go away. They remain part of the engine behind a new, machine learning-driven, human-guided corpus of trustworthy data for everyone.
Machine learning has already helped us get to where we can manage 10 times as much data, with one-tenth the people and in one-tenth the time.
In the next article, I’ll provide some practical guidance on putting this new technology to work. Only through a combination of new tech, new skills and new organizational approaches can the full potential of modern data management be realized. The good news is you’re probably further along on that journey than you may think.
We’ve seen the democratization of enterprise data analytics and the influx of data scientists with their ever-thirsty models. It’s time to give the machine-driven, human-guided approach to enterprise data management a serious look.