It’s becoming a hybrid cloud world, as more enterprises move to deploy workloads across multiple public cloud providers as well as private clouds. Of course there are many good reasons for moving to the cloud, including elasticity of compute capability, and the need for a modern application environment.
But one area that hasn’t seen enough focus in most organizations is the need to think about hybrid computing in terms of a compatible data management capability.
In sum, in order to take full advantage of a hybrid computing model, enterprises must also adopt a data model that enables users to work with the data wherever it makes sense to do so, whether in a public or private cloud, in an on-prem data center, or even across a multi-organization data exchange, and do so with minimal friction.
Start with the Data
The first step in any hybrid cloud strategy is to make sure that the data itself is portable and fully managed. Indeed, the ultimate goal is to provide a uniform data repository for your analytics and AI software platform that is available to all public and private cloud implementations, while still evolving towards a truly “agnostic” platform that can transparently support any workloads on any cloud platform.
Any such a solution must implement a number of features that makes the data available to a wide variety of workloads. This must exist in a secure, scalable and elastic environment that includes a common control plane to manage all the data for on-prem data replication and global data search, as well as data governance to meet regulatory requirements.
The solution must have broad template coverage and runtime compatibility across platforms, and operate with all the major public clouds (AWS, Azure, GCP), as well as running with AI accelerators (e.g., Nvidia). Using data warehouses and/or proprietary data storage solutions from individual cloud providers prevents the maximization of data asset value and hinders easy transport.
Data Quality is More Important than Quantity
Another key consideration for a distributed, hybrid environment: as more enterprises move to AI-enabled systems, and build out models based on their data sets, it becomes much more critical that the quality of the data is confirmed. Models used for machine learning and artificial intelligence algorithms based on corrupted data create bad models, and unreliable insights.
To maximize the value of enterprise data, and directly drive quality insights provided by that data, companies need a data platform providing clean, up to date, fully managed and secure data. That also requires an ability to see who has “touched” the data, and not just by individuals but also by AI models and workload processes.
To accomplish this, tracking the lineage of the data is crucial, and not all data warehouses readily provide such functions.
Data Progression from On-Prem to Cloud
Organizations often begin the transition to cloud by adopting cloud-based modern application strategies like containers and microservices, use them on-prem first, and then move them to the public cloud. This methodology enables safe experimentation and can confirm which workloads can safely and effectively move to the public cloud and which are best to stay on-prem.
This practice may not be a one-way street, as workloads in the cloud may make sense to move back to on-prem deployment, or move to another public cloud provider as desired. This requires a way to create cloud instances that are not locked-down by using proprietary APIs and system components that so often prevent moving both apps and data from one cloud to another. Lock-in is a real problem, even with those vendors that claim fully open cloud environments. The point: you can’t manage data properly in a locked-in environment.
Choose the Right Data Platform
The use of an enterprise hybrid-enabled data platform increases the ability to move data to the required workloads running on various processors on-prem or in the cloud. Further, it assures the quality of data by managing a single instance and not having diverse data sets disbursed across multiple platforms.
Finally, in regulated industries and/or data restricted geographic compliance situations, maximizing governance of the data is critical to preventing regulatory fines and data breaches. Ultimately, we expect that top data platforms will enable a broader ability to write once and run on any cloud environment, thus allowing a true mix-and-match workload processing capability. Enterprises must set up true hybrid-enabled data cloud capability if they are to derive the maximum benefit from a hybrid cloud environment.