As the volume and complexity of data increase, using the cloud is no longer a question. But many other questions are emerging as enterprises look to optimize their cloud investments and build analytics infrastructures that respond with agility to volatile business, market, and customer demands.
One of the key decisions the enterprise has to make when evaluating cloud solutions is whether or not to separate cloud compute and storage. Organizations everywhere are looking for the most cost-effective and performant ways to manage massive amounts of data as it streams in from multiple sources and is leveraged by many applications. Without decoupling cloud compute from storage, the enterprise must scale both at the same time. Without knowing the value of the data streaming into storage or being analyzed by compute nodes, it can be difficult for the enterprise to know if increasing compute or storage capacity is worth the additional cost. And in cases where the enterprise is deploying in a hybrid environment, scaling both cloud and on-premises storage and compute can get complex quickly, costing the business valuable people-hours and resources.
By scaling cloud compute and storage independently, the enterprise can make a more focused, cost-effective strategic decision. However, separation brings its own challenges. Cloud compute and storage still need to work together, even if they’re independent. Disconnecting communication between data and object stores can result in unnecessary data movement and duplication and higher latency rates.
To understand more about the complexity of decoupling these functions and how to overcome those challenges, let’s first dive into the history of the dilemma itself.
A brief history of compute and storage
While the cloud has added new dimensions to the question about whether separating cloud compute from cloud storage is ideal for data management, this debate has actually existed for decades.
As software system developer Adam Storm explains, historically, compute and storage were tightly coupled in database systems, which were originally designed for transaction processing in industries like air travel and banking. Low latency mattered the most in such systems to ensure the persistence and integrity of transactional data. Keeping storage as close as possible to compute was an important part of keeping latency down, particularly considering the network speeds of the early days of computing.
In the decades that followed, organizations leveraged databases for more than just transaction processing. As business and governmental leaders realized the value of data to drive insights, the data warehouse emerged to store massive amounts of information across systems. The enterprise also began to execute frequent, complex analytical queries that uncovered insights, making low latency and costs for both storage and compute crucial priorities.
Considering that today’s network and storage bandwidth is rising faster than memory bandwidth, more enterprises are recognizing the challenges of scaling storage and compute together when these functions have such unique, contextual requirements.
Striking the ideal balance
In order to keep up with the dynamic and always-evolving demands of both data and users, companies need an architecture that separates compute from storage while still allowing for seamless communication and compatibility. To do this effectively, your platform should have a high-speed fabric that connects engines, data stores, and object stores. This creates a unifying layer that allows data and compute nodes to work together but also independently.
Together, data and compute can operate as one system, minimizing data movement and duplication and making it easier to collaborate and share data across the enterprise. At the same time, keeping these functions independent provides the foundation for exponential growth in data and allows for dynamic scaling. Scaling a data cluster with storage separated from compute takes a small number of seconds compared with the hours or days it can take with systems where these functions are tightly coupled.
The business impact of separation
There are often times when the enterprise needs to grow compute capabilities while keeping storage relatively constant, such as during high-traffic periods like Black Friday or at the end of the quarter. Scaling these functions independently frees the enterprise to pay only for the what it uses, as it uses it.
According to Micro Focus’s Senior Director of Vertical Product Marketing Jeff Healey, a retailer would save approximately 66 percent in compute costs by separating compute from storage instead of provisioning for year-round peak workloads.
Imagine the cost savings for the enterprise that runs short-term projects that they can “hibernate” by shutting down compute nodes during periods when they’re not needed. For example, a medical equipment manufacturer can do this before going to market with a new device. By running a small trial among key customer accounts, the manufacturer can ensure maximum reliability when the equipment is launched widely.
Or think of the telecommunications company responding to a snowstorm impacting hundreds of thousands of their customers. With their storage and compute scaled separately, the company can quickly increase their compute nodes and thus their analytical capacity. They can then quickly and precisely diagnose service issues, anticipate customers who may encounter outages and contact support representatives, and analyze the storm’s impact on customer churn.
For any enterprise today, agility remains a key step on the road to digital transformation. Whether deploying in a hybrid or public cloud environment, separating cloud storage from cloud compute gives organizations the flexibility to accommodate both consistent data requirements and meet variable or unforeseen needs.
See how Teradata Vantage can help you optimally balance storage and compute.
Learn more about Vantage