AI has disrupted this approach. Training AI models is dependent on systems being able to read from massive, unstructured datasets (such as text, images, video and sensor logs, among many others) that are distributed and accessed in random, parallel bursts. Instead of a handful of applications queuing in sequence, a business might be running tens of thousands of GPU threads, all of which need storage that can deliver extremely high throughput, sustain low latency under pressure and handle concurrent access without performance bottlenecks getting in the way.
The problem is, if storage cannot feed that data at the required speed, the GPUs sit idle — burning through compute budgets and delaying the development and implementation of mission-critical AI projects.
Lessons from HPC
These challenges are not entirely new. High-performance computing environments have long grappled with similar issues. In the life sciences sector, for example, research organizations need uninterrupted access to genomic datasets that are measured in the petabytes. A great example is the UK Biobankwhich claims to be the world’s most comprehensive dataset of biological, health and lifestyle information. It currently holds about 30 petabytes of biological and medical data on half a million people. In government, mission-critical applications, such as intelligence analysis and defense simulations, demand 99.999% uptime, and even brief interruptions in availability can potentially compromise security or operational readiness.
