Find out why enterprise architects are exploring new strategies that distribute analytics closer to IoT devices.
One of the biggest challenges with the internet of things lies in finding a balance between gathering high-resolution data from sensors and aggregating the most relevant exceptions or trends in the cloud. One emerging trend lies in doing some processing of this data closer to the edge, and then only sending exceptions or summaries into the cloud for storage and further processing.
"Traditionally, IoT edge analytics meant pushing IoT data to the cloud and dumping it into a data lake for big data analytics," said John Crupi, VP and engineering system architect at Greenwave Systems Inc. "As much as we wished this would be sufficient for real-time analytics, it fell very short. It didn't matter if we had the fastest computers running Spark. Architecturally, it wasn't efficient."
Combining cloud and edge analytics is particularly relevant to embedded analytics, real-time analytics and time-critical insight, Crupi said. Embedded analytics allows gateways and devices to make their own decisions independent of the cloud.
Real-time analytics makes it possible to make decisions at the about the same time that events happen. Time-critical insights are important for responding to real-time events across a large number of devices, such as cyberattacks.
"Distributed analytics are useful when data is too big to be transferred to the cloud or when bandwidth between sensors and analytics servers is limited," said Srinath Perera, vice president of research at WSO2 Inc.
This is important when the sensors are geographically distributed, the analysis is too complicated or data is too big to be analyzed in a single machine. Additionally, when the time taken to transfer the data is high, the validity of some critical data will be lost.
Distributed analytics can be done on the sensor itself or via gateways placed near the sensors. Some of the tools for building out these distributed analytics architectures include WSO2 Siddhi, Apache Edgent and Apache MiNiFi. IoT edge analytics algorithms running on or near devices are a subclass of stream processing analytics.
Stream processing systems running in the cloud include Apache Kafka, Apache Flink, Apache Storm, Apache Spark Streaming and WSO2 Data Analytics Server.
Bringing stream processing to the edge
Stream and batch processing enable different speeds of analytic processing, said James Kirkland, chief architect of IoT at Red Hat Inc. Stream processing is better at responding to events.
Batch processing allows deeper insights and can also use machine learning techniques that improve analytics algorithms. These can then be pushed out to streaming analytics applications running on gateways or devices in the field.
Streaming analytics technologies look at known sets of data and watch for conditions that they have been trained to find via machine learning algorithms. This is something that would be useful for finding a machine that is going to fail in the next 24 hours. Streaming analytics can feed predictive maintenance algorithms from in or near the field devices, or on an IoT gateway.
Batch analytics takes massive pools of data that may or may not be related and does broad analysis, looking for patterns or correlations in that data.
Kirkland said, "It is a huge computational task that takes a long time. You need to prepare these massive data sets, preen them, test them and then run the analysis. You will come out of this with new ways to optimize your business in ways that would potentially be otherwise unseen."
Enterprise architects also need to think through the data transportation, retention costs and how to make use of the massive data sets a company will collect. A recently conducted Cisco survey indicated that up to 60% of IoT projects either stalled or failed due to businesses not being well-enough prepared to reap the benefits of their data, as well as the lack of quality of the data collected.
Red Hat's Kirkland said, "This means that the quantity and quality of that data is a big problem. It is orders of magnitude larger than most companies have ever dealt with. The danger is not too little data; it is too much. A large amount of junk data is worse than a smaller amount of quality and insightful data. Because it is measured and collected doesn't mean it is worth using."
Orchestrate data feedback loops
Understanding how to architect a symbiotic relationship between cloud and edge analytics is another big challenge to setting up a distributed analytics infrastructure, said Greenwave's Crupi. It is symbiotic because there isn't a one-way relationship where analytics are done on the edge and pushed to the cloud.
Crupi explained, "When we talk about IoT edge analytics, we are really talking about extreme distributed analytics. It's about pushing intelligence to the edge and being intelligent about where the analytics are done. When talking about the potential of billions of connected devices, we are also talking about billions of compute devices available to be a part of a bigger solution."
For example, a smart city might use smart cameras to analyze real-time video independently, but share analytics with the cloud and other cameras to get a bigger real-time picture. If there is a specific issue, all available cameras could be directed to look at specific areas of interest.
In the aftermath of a disaster, these cameras could look for the best evacuation and traffic flow routes.
Crupi said, "This may sound like a science fiction idea, but it is exactly where distributed analytics needs to go."
Architect for streaming data
There are a variety of steps enterprise architects can take to help manage these challenges. One good practice is to version and manage data streams, since they will change over time, WSO2's Perera said.
It's also important to consider having separate plans for what data is going to be gathered at a field level versus what data is going to be transmitted to the enterprise. A few hundred bytes per hour, per device, over several million devices, six times an hour can add up to millions or hundreds of millions of dollars in telecommunications costs.
Red Hat's Kirkland said, "I would also suggest using sample data to figure out how this can be related to the business problems that are trying to be solved. Manually analyze this data using tools like Python and R, which will then give you a better idea as to which software tools would be the best to use. Don't fall in love with a particular software technology; just pick the right tool for your problem."
It's also important to think through the kinds of IoT edge analytics capabilities available versus those in the cloud.
Greenwave's Crupi explained, "Distributed IoT computing encompasses two completely different engine architectures coming together as one. The edge needs to be small and the cloud needs to be big. Although 'small' is relative, we are really talking about computing for small devices which have a microprocessor."