The volume of data we generate is expected to grow by an impressive 27 per cent a year, according to analyst firm IDC. To create value from this data, we must be able to process it into meaningful insights and clearly, data processing can no longer happen only in the cloud. The focus is now on generating insights where the data resides, in storage devices. This is what is driving the rapid development of computational storage.
Computational storage brings processing closer to the data
Computational storage is all about making storage devices smarter to process the data directly where it is stored, on the drive. This approach reduces the movement of large amounts of data to external processing and delivers important benefits including reduced latency and bandwidth usage, increased security, and energy savings.
With computational storage, compute and storage are coupled to run applications locally on the data, reducing the processing required on the remote server, and reducing data movement. How does this work? Processors in the drive controller are dedicated to processing the data directly on that drive, which allows the remote host processor to work on other tasks. In a traditional compute system, the compute wants to do some processing on the data and therefore requests the data from the storage. In a computational storage system, the compute does not request data but an operation to be carried out on the data by the drive itself. As the data does not need to leave the drive, computational storage is a smart, secure, and energy-efficient solution for next-generation storage applications.
About the author
Neil Werdmuller is Director of Storage Solutions at Arm
Linux is the key to the rapid adoption of computational storage
The amount of data being stored and the number of workloads – software that works on the data to generate insight and value – is exploding. This growth is a huge challenge and computational storage will enable workloads to be moved to where the data is physically stored.
There are Computational Storage Drives (CSDs) already available, for example from NGD Systems and ScaleFlux, and many others with proof of concept products. Some use FPGAs to add compute but others have application processors to enable full Linux distributions.
Arm views on-drive Linux as key to the rapid adoption of CSDs. On a standard SSD, NVMe protocols send blocks of data to be stored on the drive and retrieve blocks of data from the drive on reads. However, the drive does not know that, for example, the ten blocks of data written actually make up a JPEG image. With Linux, the drive can ‘mount’ the file system that is stored on the drive and understand what the blocks of data on the drive are. A CSD needs this information to work autonomously on the data stored on the drive – for example classifying the images stored on the drive using ML.
Linux enables storage players to leverage the huge Linux open source ecosystem and enables security and containerization, without having to re-invent new solutions. Linux also enables workloads to be easily migrated from the server onto the storage drive. It can be complex and time consuming to deploy CSDs that require deeply embedded code to be written or FPGAs to be programmed, which could impact deployment.
Standardization will drive the future of computational storage
There are 44 companies involved in the SNIA technical working group that is defining the computational storage standards. CSDs will really take off when the standards exist. As more CSDs become available, the market will grow rapidly as it enables innovation through the local generation of valuable insights from the data stored on the drive. The key issue is generating insight and value from that data, and this is best done where the data resides, reducing latencies and bandwidth requirements.
An ethernet connected CSD that is running Linux is really just a mini server. It has internet connectivity, compute, memory and storage – and can be deployed anywhere storing and generating insight and value from that data. This brings huge potential for many markets and enables dramatic innovation.
Computational storage creates value across a range of applications
With computational storage, data workloads are processed directly on the storage controller. This is critical to address the processing requirements of many ML or analytics applications, and opens huge opportunities across applications including IoT, ML and edge computing.
To name a few, computational storage can have a significant impact in:
- Database acceleration, where operations are performed directly on the data
- Content delivery networks (CDNs), easily enabling very local content delivery
- AI/ML, generating insights directly from the vast amounts of data
- Edge computing, where a CSD running Linux is a self-contained small server
- Image classification, enabling meta-tagging directly on the data where it is stored
- Transportation, with direct processing of stored telemetry data in a vehicle
If we take the transportation example, modern airplanes generate terabytes of data a day, and this data is usually offloaded for analysis. With computational storage, airlines can perform real-time data analysis directly on the drive, on the plane. Therefore, when a plane lands, this technology can help to ensure it’s safe for the next flight in 30 minutes or less, resulting in faster turnaround and better safety for passengers.
In a world of billions of connected devices, data processing can no longer only happen in the cloud
Computational storage enables us to maximize the benefits of data to organizations and to society. It puts processing power where it is needed and gives us quick and easy access to insights and value from data. Computational storage is evolving rapidly, and we can expect wider adoption and innovative applications in the coming years.