Spurred by global trends such as digitalization and the cloud, more data is being created by more people, more organizations and more devices. According to a recent global study by Veritas, the annual data growth rate has skyrocketed to 48.7 percent. In fact, more than 50 percent of the file count were of “unknown” nature. It is a rising challenge to manage the soaring volumes of data while supporting innovation and mitigating the risks associated with it.
And in a multi-cloud era, who holds the ultimate responsibility for data management – the customer or the cloud provider?
The latest Veritas Truth in Cloud study revealed that there are misconceptions of data management in the public cloud. And adding to that complexity, the Veritas 2017 GDPR report showed that 32% of businesses are also fearful that their current technology stack is unable to manage data effectively – a key point that could hinder their ability to search, discover and review data – all essential criteria for GDPR compliance.
If businesses are unable to manage data effectively – the natural question would be: How could they better manage and understand the real value of their business data to stay in the race?
David Noy, Vice President for Product Management at Veritas, believes that data management is the core foundation for all technology companies today – much like managing a clutter-free and secure home environment.
In an email interview with Networks Asia’s Ken Wong, Noy talks more about data management and how businesses can ensure data sits as a positive asset in their balance sheet.
What has changed from the traditional ways we used to manage and tier data? What is causing these changes?
Traditionally, we have managed and stored data by looking at tier 1 arrays, along with mission-critical, server and Unix-based applications. Thereafter, virtualized and containerized applications came into play, before we move on to Mode-2 type workloads – where scale-out functionality made it very difficult to predict capacity and performance requirements ahead of time, as they became much more fluid.
What we needed to do is to adapt the traditional ways of protecting data and ways of putting data onto media, where it is kept for a long time or until it is actually needed to be restored for modern use. This is usually done through quick, snapshot based data protection, API based data protection or even virtual machine based data protection.
From here, we need to be able to move data, either into high performance disk based systems or media based systems for fast restore, though these could potentially involve high expenses. In a way, these choices have resulted from the complexity of the data centre which has undergone constant evolution.
Old mission-critical tier 1 applications continue to expand with next generation applications, such as dev ops type environments and cloud native applications. What businesses require is a data management and protection solution that can really look at all of these environments holistically as a single tool.
Today, the good news is that we now have low cost options – utility and cloud based – available to us where we can move protected data to the cloud at a very low price, turning data management into more of an operational expenditure rather than an additional capital expenditure.
What we begin to understand is that there is more than one type of data that is critical to an enterprise. It could also be combinations of data that resides in various types of environments, coupled with unstructured data and data that goes into analytics repositories – like Splunk or Hadoop – or even the next generation data bases like MongoDB.
A more robust solution which provides data management across all of those solutions will allow businesses to extract value out of that tiered data by classifying it. Solutions that provide such visibility can enable businesses to add even more value to their assets and Veritas provides such a solution.
Why has there been an increase in unknown files within organizations? Is the cloud or shadow IT to blame?
Shadow IT does have a part to play here. For example if we look at Hadoop deployments in a typical organization, data is collected and is used for a varying amount of projects, including analytics. If not well dealt with, the amount of accumulated data becomes too large and difficult to manage and it will eventually become a concern for IT. Unfortunately, shadow IT projects can be quite prevalent in some large enterprises.
At the same time, you will also notice that historically, employees do not tag or know how to classify data at the time of creation and it’s alarming to know that a lot of data had being created, especially in regulated industries such as financial services and healthcare – where such information has to be kept for a very long period.
We understand from customers that after years of storing this information, nobody remembers the original purpose of that data. The most common problem cited was their data cannot be deleted as they do not know if that information is still as important as it was all those years ago.
New product innovation has also fuelled the growth of unknown files. Corporations seeking to enhance the customer experience are using emerging technologies to build new product and services, resulting in a surge in new and diverse types of files, typically unknown in nature.
Veritas is currently trying to change the game by building data classification capabilities into our products to ensure that important data can be used accurately. These capabilities help to provide information and metadata around new data immediately so that businesses can later come back and understand what that data was all about and what needs to be done with it.
By doing so, Veritas aims to assist businesses with classifying this information as soon as it is created, especially since we are generating more and more data each year. According to the Veritas Data Genomics Index 2017, the annual data growth rate skyrocketed to 48.7 percent. With the vast amount of files being generated each year, going back to classify the information later may prove to be an arduous task.
Are we looking at data creation in the wrong way?
Yes, we have been looking at data creation the wrong way. We create a lot of data because we know that it is valuable – almost like we know that if we throw a thousand seeds on the ground, we are going to get a couple of important trees. The problem is that we have no way of coming back to figure out which of these seeds are important.
At Veritas, we look at a very different kind of “creation first” approach where we do classification on the data when it is actually being created or ingested into our backup products or software-defined offerings.
By doing this, businesses will get a lot more information around that data upfront and are able to visualize and keep track of it. This will ultimately give businesses more holistic suggestions on how to manage it well.
How important has data generated from M2M communication or machine generated data become and how should we be dealing with it?
Machine generated data is huge. We can be talking about log based data which is being used for security analysis purposes, IoT or sensor networks. Industrial automation has definitely taken off but IoT in general is going to continue to drive large amounts of machine generated data. That data will all have some purpose, but it comes as no surprise that some of it will be more useful than others.
Regardless of the amount, we have to find a way to be able to make sense out of it. The products that are able to store the data efficiently while understanding it and determining what can be kept and what can be thrown out are going to be the ones that provide value in this industry.
What about generating information from data? Are we making more or better sense from what data we have or are we fumbling in the dark?
We are making progress when it comes to understanding our data and that's actually the whole point of why Veritas is creating these data classification engines. Today, it's mostly around compliance and governance, not only for regular businesses, but also for regulated industries such as finance and healthcare. For instance, the pharmaceutical firms generated petabytes of data and it critical to classify the data immediately. Otherwise, it will be a huge challenge to do so after the fact.
In future, we can increasingly expect additional data classification policies around different verticals that can help to define metadata and enable us to understand why that data was created in the first place (including the context, why it is important, and what the data represents).
This will allow us to realize whether or not that data is an asset or a liability. If it is a liability, businesses can consider removing that data over time if it is not a violation against regulations from various governing bodies.
In this cloud era, with multiple endpoints, where does the onus of data management and security ultimately lie?
According to our recent Truth in Cloud study, 83 percent of global respondents believe that their organization’s cloud service provider (CSP) is responsible for ensuring that their workloads and data in the cloud are protected against outages. While we may think that the CSPs should be responsible for managing and securing data, the reality is that they are not.
CSPs tend to be huge targets for anyone perpetrating any sort of data hack or breach as their data centres contain large amounts of data from a large variety of customers, due to the scale they are operating at.
Given the circumstances of the CSPs, it comes as no surprise that they have made it the responsibility of the enterprise or the organization who owns that data to protect and secure it. The fact that an organization’s data is kept on a public cloud infrastructure does not mean that it is the CSP’s responsibility to keep it safe.
It is absolutely vital for businesses to hold the responsibility of protecting that data, along with the personal information of their customers. If that cannot be handled from an organisational level, re-evaluating IT strategies and determining what data needs to move off-premises, and what data needs to be kept on-premises, should become a priority.
As governments push for greater regulatory compliance, should data management best be handled by a third party or service provider?
Regulatory compliance requires us to be able to protect sensitive information about people, finances and entities. Enforcements should be enacted and penalties should be applied if businesses cannot meet those requirements. The situation will become even more complex when a third party is involved.
This is almost akin to the case where we are unable to get a third party to assume responsibility for medical malpractice by a doctor. Hence, it is ultimately the responsibility of the organization to be accountable for protecting and managing its data.
While avoiding stringent regulatory penalties is clearly a driving force behind improving an organization’s digital compliance, many companies also see business benefits that go well beyond the sanctions.
According to the Veritas 2017 GDPR Report, almost all businesses (95 percent) globally see substantial business benefits to achieving GDPR compliance, including better data management across the entire organization.
It is encouraging to note that 88 percent of those surveyed globally plan to drive employee GDPR behavioral changes through training, rewards, penalties and contracts. In fact, almost half (47 percent) of businesses will go so far as to add mandatory GDPR policy adherences into employment agreements to ensure that they remain on the right side of the regulation.
We’ve always spoken on the importance of data for a business’s success, so why is data management lacking or falling behind the curve?
There are many different types of data and with it, many different systems for managing that data. Today, data can be aggregated from a variety of sources, be it enterprise database, CRM database, unstructured data feeds or video feeds, among others. Being able to have a holistic 360 data management view will offer businesses not just data protection but also business continuity and visibility into what they have, including the ability to classify data and migrate it from one location to another across a myriad of different workloads.
However, this is a huge challenge and it is not something that businesses can start and take off immediately. In fact, most start-ups begin this journey by focusing on just one use case and that's how they gain momentum. Nonetheless, the reality is that this method does not get a business very far when it comes to good data management.
The advantage that Veritas has is that we've been in this business for a very long time. We have a strong legacy base protecting mission-critical and tier 1 applications and we are expanding very quickly into tier 2 applications. We partner very closely with cloud service providers for off-premises applications and protecting applications that were born in the cloud. This is our space to own.
Why does data management play a vital role in digital transformation and where are we headed?
Digital transformation means a lot of different things to different people. Does it mean that we are moving away from analog and paper based solutions or simply putting things in bits and bytes? To me, digital transformation is more about how we take all of the silos of data that has been acquired in a business and make them valuable.
This can be done through pipelining and aggregating data to produce more high order value, rather than just having a set of security network numbers or a bunch of purchasing patterns.
Digital transformation is really the ability to consider what should be done with digital assets and extracting the maximum value while protecting those assets. The data could be residing across increasingly complex heterogeneous environments – be it on-premises, off-premises or in the cloud.
Digital transformation plays such a critical role and companies who practice good data management will have a competitive advantage over others who basically have information silos. This competitive advantage could translate into more targeted marketing or better financial decisions.
How can businesses ensure data sits as a positive asset in their balance sheet? What costs can impact a business if they have a loose data management framework compared to a tight one? What other intangible impacts can it have?
The notion of how much is data worth is increasingly relevant, even though there is no formula for placing a price tag on data. When we think of some of the large organizations globally, their balance sheets will not reflect data as an asset, yet it is clear that data is at the core of their businesses.
To ensure that data sits as a positive asset in their balance sheets, businesses must be able to effectively manage their data. Firstly, it starts with a fundamental process of knowing what the data is, where it is located, how to manage it and who owns it. The next step is to look at how the data can be protected and the resources required.
Next, businesses should start to put the data into better use, such as determining who should be having access to it and developing actionable insights from the data. This way, we are working towards using data as an asset, rather than sitting in the organizations and simply occupying storage space.
Given the nature of data creation and storage, there is a tendency for organizations to create multiple copies of the same file, across different locations. To mitigate this, it is critical for businesses to have visibility of their data and appropriate data management policies in place, to help them stay on top of things.
For businesses, having a loose data management framework will not serve them well as it could result in potential financial losses. For instance, an alarming majority (60%) of organizations globally have not fully evaluated the cost of a cloud outage, even though the majority of them operate with a cloud-first mentality.
Also, while more than one in three respondents (36 percent) expect less than 15 minutes of downtime per month, the reality is that almost a third (31 percent) have experienced downtime more than double that per month (31 minutes or more).
Depending on the complexity of application inter-dependencies during restart and the amount of data lost during downtime, or even worse, an outage, businesses could possibly lose hours or even days before they can finally recover and get their applications back online.
On the digital compliance front, it is equally important for businesses to stay on the right side of law or risk paying huge financial penalties (In the instance of GDPR, close to €20 million or 4 percent of the company’s global annual turnover). The severity for non-compliance will not just end with these penalties.
Being non-compliant with data regulations could potentially have a devastating impact on an organisation’s brand image, especially if and when a compliance failure is made public, potentially as a result of the new obligations to notify data breaches to those affected. Other adverse consequences include the de-valuation of the brand as well as the loss of customer’s loyalty – which most companies fear.
In summary, it is imperative for businesses to have a strong data management strategy, with the appropriate measures and policies in place to make intelligent decisions and set them apart from their competitors.