Data quantity and availability have been prioritized over quality, understanding, and usability.
The whole self-perception of analytics teams tends to sound something like, “Our data lake has 29 gazillion million petabytes of data from more than eleventy thousand source systems!!” Demand from business users tends to sound something like, “Just get the data into the lakehouse today and I’ll figure it out from there.”
Information management folks are left chasing malaysia whatsapp number data the train as it speeds away and having conversations that tend to sound something like, “ do you know what it means or know what it’s supposed to contain?” “Some.” “Some? Like what percentage?” “Well, like maybe one percent maybe if you round up.”
You stare into the deadlights of a seemingly overwhelming reality that very, very few of your feeds, streams, or data sets satisfy the requirements for a data product.
It amazes me that we’ve managed to be reasonably successful. Analytics users at nearly every company with whom I’ve discussed this topic have referred derisively to their data lake as a data swamp, cesspool, or quagmire. The enterprise analytics team is often viewed with similar esteem, usually as a project critical path bottleneck. After all, the implementation of hundreds, maybe thousands of data feeds is dependent upon this single team. And the responsibility for all those feeds falls on the back of that same team. Would that be tolerated in any other domain? Of course not. This approach is not scalable, it’s not sustainable, and it’s not the best use of resources.