• Expert

Manage company data as you would for your own home

Spurred by global trends such as digitalization and the cloud, more data is being created by more people, more organizations and more devices. According to a recent global study by Veritas, the annual data growth rate has skyrocketed to 48.7 percent. In fact, more than 50 percent of the file count were of “unknown” nature. It is a rising challenge to manage the soaring volumes of data while supporting innovation and mitigating the risks associated with it.

And in a multi-cloud era, who holds the ultimate responsibility for data management – the customer or the cloud provider?

The latest Veritas Truth in Cloud study revealed that there are misconceptions of data management in the public cloud. And adding to that complexity, the Veritas 2017 GDPR report showed that 32% of businesses are also fearful that their current technology stack is unable to manage data effectively – a key point that could hinder their ability to search, discover and review data – all essential criteria for GDPR compliance.

If businesses are unable to manage data effectively – the natural question would be: How could they better manage and understand the real value of their business data to stay in the race?

David Noy, Vice President for Product Management at Veritas, believes that data management is the core foundation for all technology companies today – much like managing a clutter-free and secure home environment.

In an email interview with Networks Asia’s Ken Wong, Noy talks more about data management and how businesses can ensure data sits as a positive asset in their balance sheet.

What has changed from the traditional ways we used to manage and tier data? What is causing these changes?

Traditionally, we have managed and stored data by looking at tier 1 arrays, along with mission-critical, server and Unix-based applications. Thereafter, virtualized and containerized applications came into play, before we move on to Mode-2 type workloads – where scale-out functionality made it very difficult to predict capacity and performance requirements ahead of time, as they became much more fluid.

What we needed to do is to adapt the traditional ways of protecting data and ways of putting data onto media, where it is kept for a long time or until it is actually needed to be restored for modern use. This is usually done through quick, snapshot based data protection, API based data protection or even virtual machine based data protection.

From here, we need to be able to move data, either into high performance disk based systems or media based systems for fast restore, though these could potentially involve high expenses. In a way, these choices have resulted from the complexity of the data centre which has undergone constant evolution.

Old mission-critical tier 1 applications continue to expand with next generation applications, such as dev ops type environments and cloud native applications. What businesses require is a data management and protection solution that can really look at all of these environments holistically as a single tool.

Today, the good news is that we now have low cost options – utility and cloud based – available to us where we can move protected data to the cloud at a very low price, turning data management into more of an operational expenditure rather than an additional capital expenditure.

What we begin to understand is that there is more than one type of data that is critical to an enterprise. It could also be combinations of data that resides in various types of environments, coupled with unstructured data and data that goes into analytics repositories – like Splunk or Hadoop – or even the next generation data bases like MongoDB.

A more robust solution which provides data management across all of those solutions will allow businesses to extract value out of that tiered data by classifying it. Solutions that provide such visibility can enable businesses to add even more value to their assets and Veritas provides such a solution.

Why has there been an increase in unknown files within organizations? Is the cloud or shadow IT to blame?

Shadow IT does have a part to play here. For example if we look at Hadoop deployments in a typical organization, data is collected and is used for a varying amount of projects, including analytics. If not well dealt with, the amount of accumulated data becomes too large and difficult to manage and it will eventually become a concern for IT. Unfortunately, shadow IT projects can be quite prevalent in some large enterprises.

At the same time, you will also notice that historically, employees do not tag or know how to classify data at the time of creation and it’s alarming to know that a lot of data had being created, especially in regulated industries such as financial services and healthcare – where such information has to be kept for a very long period.

We understand from customers that after years of storing this information, nobody remembers the original purpose of that data. The most common problem cited was their data cannot be deleted as they do not know if that information is still as important as it was all those years ago.

New product innovation has also fuelled the growth of unknown files. Corporations seeking to enhance the customer experience are using emerging technologies to build new product and services, resulting in a surge in new and diverse types of files, typically unknown in nature.

Veritas is currently trying to change the game by building data classification capabilities into our products to ensure that important data can be used accurately. These capabilities help to provide information and metadata around new data immediately so that businesses can later come back and understand what that data was all about and what needs to be done with it.

By doing so, Veritas aims to assist businesses with classifying this information as soon as it is created, especially since we are generating more and more data each year. According to the Veritas Data Genomics Index 2017, the annual data growth rate skyrocketed to 48.7 percent. With the vast amount of files being generated each year, going back to classify the information later may prove to be an arduous task.

Are we looking at data creation in the wrong way?

Yes, we have been looking at data creation the wrong way. We create a lot of data because we know that it is valuable – almost like we know that if we throw a thousand seeds on the ground, we are going to get a couple of important trees. The problem is that we have no way of coming back to figure out which of these seeds are important.

At Veritas, we look at a very different kind of “creation first” approach where we do classification on the data when it is actually being created or ingested into our backup products or software-defined offerings.

By doing this, businesses will get a lot more information around that data upfront and are able to visualize and keep track of it. This will ultimately give businesses more holistic suggestions on how to manage it well.

How important has data generated from M2M communication or machine generated data become and how should we be dealing with it?

Machine generated data is huge. We can be talking about log based data which is being used for security analysis purposes, IoT or sensor networks. Industrial automation has definitely taken off but IoT in general is going to continue to drive large amounts of machine generated data. That data will all have some purpose, but it comes as no surprise that some of it will be more useful than others.

Regardless of the amount, we have to find a way to be able to make sense out of it. The products that are able to store the data efficiently while understanding it and determining what can be kept and what can be thrown out are going to be the ones that provide value in this industry.

What about generating information from data? Are we making more or better sense from what data we have or are we fumbling in the dark?

We are making progress when it comes to understanding our data and that's actually the whole point of why Veritas is creating these data classification engines. Today, it's mostly around compliance and governance, not only for regular businesses, but also for regulated industries such as finance and healthcare. For instance, the pharmaceutical firms generated petabytes of data and it critical to classify the data immediately. Otherwise, it will be a huge challenge to do so after the fact.

In future, we can increasingly expect additional data classification policies around different verticals that can help to define metadata and enable us to understand why that data was created in the first place (including the context, why it is important, and what the data represents).

This will allow us to realize whether or not that data is an asset or a liability. If it is a liability, businesses can consider removing that data over time if it is not a violation against regulations from various governing bodies.

In this cloud era, with multiple endpoints, where does the onus of data management and security ultimately lie?