You probably already know that leading analyst firms have been quoting data lake failure rates of 85% for some time now.
 
You may not be aware that one of those same leading analyst firms are now also forecasting that, by 2020, 30% of data lakes will be built on standard relational DBMS (database management system) technology “at equal or lower cost than Hadoop” because – and I quote - “application performance is superior” and “most data going into data lakes is relational.”
 
Put those two things together and you start to understand why MapR has recently gone to the wall. And why Cloudera is under so much financial stress.
 
With many organisations having invested tens and even hundreds of millions of dollars in data lakes that deliver little or no business value, it’s way past time for some brutal self-assessment in the technology industry.
 
Many data lakes have failed because they were IT-led vanity projects, with no clear linkage to business objectives and operational processes. If the strategy for your failing data lake is to lift-and-shift it lock-stock-and-barrel from Hadoop to an object store, then you are about to flush more millions down the pan - to say nothing of the opportunity cost associated with several more wasted years. Unfortunately, I know from personal experience that this is absolutely the plan in several large organisations that really ought to know better.
 
Failed data lakes often represent a toxic combination of both poor technology choices and an inadequate approach to data management and integration. If you think that data management begins and ends with ACID (Atomicity, Consistency, Isolation, Durability) compliance – as at least one of the cool kid vendors that e-mails me regularly seems to –  then pick any technology platform you like, so long as you do it quickly. If you are going to fail anyway, you may as well fail fast. 
 
Better yet, develop a data strategy that includes a layered data architecture, a minimum viable product approach to data integration (we call that “Light Integration”) - and an agile, incremental approach to the more robust integration of the data that matter most. That gives you a fighting chance of optimising end-to-end business processes and delivering real business value.
 
Much of the complex, multi-structured data that today sits unloved and unqueried in Hadoop-based data lakes will ultimately reside in object storage. At Teradata, we recognize this – hence our focus on enabling robust access to object stores. But much of your structured and semi-structured interaction data belongs in your existing data and analytics platform, where they can be seamlessly integrated with the transaction data you already manage there. Don’t just take my word for it, ask the analysts.
Much of your structured and semi-structured interaction data belongs in your existing data and analytics platform, where they can be seamlessly integrated with the transaction data you already manage there.
Not every data lake is a data swamp – and like all technologies, the Hadoop stack has a sweet spot. But the tide of history is now running against data silos masquerading as integrated data stores, just because they are co-located on the same hardware cluster. And that same tide is running against a distributed file system and lowest-common denominator SQL engine masquerading as a fully-fledged analytic DBMS. 
 
If you are doubling-down your investment in Hadoop, you are swimming against that tide. And if you are betting on a fashionable-but-unproven technology to get you out of a data management hole, then you aren’t learning from recent history – you are condemning yourself to repeat it. But if you are ready to move on and look forward, talk to us about the industry’s leading integrated data and analytic platform, Teradata Vantage.
Martin Wilcox

Martin is a Senior Director in Teradata’s Go-To Market organisation, charged with articulating to prospective customers, analysts and media organisations Teradata’s strategy and the nature, value and differentiation of Teradata technology and solution offerings.

Martin has 21 years of experience in the IT industry and is listed in dataIQ’s “Big Data 100” as one of the most influential people in UK data-driven business. He has worked for 5 organisations and was formerly the Data Warehouse Manager at Co-operative Retail in the UK and later the Senior Data Architect at Co‑operative Group.

Since joining Teradata, Martin has worked in Solution Architecture, Enterprise Architecture, Demand Generation, Technology Marketing and Management roles. Prior to taking-up his current appointment, Martin led Teradata’s International Big Data CoE – a team of Data Scientists, Technology and Architecture Consultants tasked with assisting Teradata customers throughout Europe, the Middle East, Africa and Asia to realise value from their Big Data assets.

Martin is a former Teradata customer who understands the Analytics landscape and marketplace from the twin perspectives of an end-user organisation and a technology vendor. His Strata (UK) 2016 keynote can be found at: https://www.oreilly.com/ideas/the-internet-of-things-its-the-sensor-data-stupid and a selection of his Teradata Voice Forbes blogs can be found online, including this piece on the importance – and the limitations – of visualisation.

Martin holds a BSc (Hons) in Physics and Astronomy from the University of Sheffield and a Postgraduate Certificate in Computing for Commerce and Industry from the Open University. He is married with three children and is a lapsed supporter of Sheffield Wednesday Football Club. In his spare time, Martin enjoys playing with technology, flying gliders, photography and listening to guitar music.

View all posts by Martin Wilcox

Related Posts