Conquering the data silo sprawl
Too much data today remains stuck in a complex sprawl of silos, writes James Petter, VP EMEA, Pure Storage
According to IDC, spending on data-intensive AI systems in the Middle East & Africa (MEA) region will grow at a CAGR of 32% between 2016 and 2021, reaching US$114.22 million in 2021. Projects range from automated customer service agents, shopping and product recommendations to health and safety use cases such as automated cyber threat detection and AI-powered medical research, diagnosis and treatment.
Data’s role in the future of business cannot be overstated. According to a survey conducted by MIT Technology Review, commissioned by Pure Storage, an overwhelming 87% of leaders across MEA say data is the foundation for making business decisions and 80% believe that it is key to delivering results for customers. But acknowledging the importance of data, and putting data to work are two separate things. To put the latter in perspective, a recent study conducted by Baidu showed its dataset needed to increase by a factor of 10 million to lower its language model’s error rate from 4.5 to 3.4%. That’s 10,000,000x more data for 1% of progress.
All this research points to one thing—to innovate and survive in a business environment that is increasingly data-driven, organisations must design their IT infrastructure with data in mind and have complete, real-time access to that data.
Unfortunately, mainstream storage solutions were designed for the world of disk and have historically helped create silos of data. There are four classes of silos in the world of modern analytics―data warehouse, data lake, streaming analytics, and AI clusters. A data warehouse requires massive throughput. Data lakes deliver scale-out architecture for storage. Streaming analytics go beyond batched jobs in a data lake, requiring storage to provide multi-dimensional performance regardless of data size (small or large) or I/O type (random or sequential). Finally, AI clusters, powered by tens of thousands of GPU cores, require storage also to be massively parallel, servicing thousands of clients and billions of objects without data bottlenecks.
As a consequence, too much data today remains stuck in a complex sprawl of silos. Each is useful for its original task, but in a data-first world, silos are counter-productive. Silos mean organisational data can’t do work for the business unless it is being actively managed.
Modern intelligence requires a data hub—an architecture designed not only to store data, but to unify, share and deliver data. Unifying and sharing data means that the same data can be accessed by multiple applications at the same time with full data integrity. Delivering data means each application has the full performance of data access that it requires, at the speed of today’s business.
Data hub is a data-centric architecture for storage that powers data analytics and AI. Its architecture is built on four foundational elements:
A true data hub must have these four qualities as all are essential to unifying data: High-throughput for both file and object storage; True scale-out design; Multi-dimensional performance; and be massively parallel. A data hub may have other features, like snapshots and replication, but if any of the four features are missing from a storage platform, it isn’t built for today’s challenges and tomorrow’s possibilities. For example, if a storage system delivers high throughput file and is natively scale-out, but needs another system with S3 object support for cloud-native workloads, then the unification of data is broken, and the velocity of data is crippled. It is not a data hub.
For organisations that want to keep data stored, a data hub does not replace data warehouses or data lakes. For those looking to unify and share their data across teams and applications, a data hub identifies the key strengths of each silo, integrates their unique features and provides a single unified platform for business.
Think of storage like a bank, or an investment. We put our money in banks, or in the stock market because we want our money to work for us. Modern organisations need to do the same with data, and they should speak to their preferred vendors to see how they can help.