As mentioned in a previous article, the last couple of months here at SOUTHWORKS we have been involved in several Big Data projects. From building complete ETLs and processing pipelines, implementing both batch and stream ingestion mechanisms, and for the last flavor of it how have we been analyzing and taking specific actions in near-real time.
As a small sample of what we have been doing, in this article we explore how to build a monitoring system over IoT devices in near-real time. In this case we simulate and monitor a printed circuit board manufacturing system and we leverage the Azure stack as a framework to bring this to life. To play with the platform we created a console application emulates the behavior of the sensors and controllers of the printed circuit board manufacturing IoT devices.
This article describes the print circuit board manufacturing scenario and the cloud infrastructure we put in place for monitor and control it.
Before we get into the thing I would like to thank the team that created this reference implementation: Facundo Hernán Costa, Pablo Costantini, Roy Crivolotti, Tomas Ignacio Escobar, Franco Bruno Lavayen & Abel Ricardo Lozano.
We all hope you enjoy it 😃
To show you how to leverage the set of Azure technologies that can be used to receive and process information from IoT devices, we came up with a simple scenario that applies to a Printed Circuit Boards factory. The factory assembly line is composed of several stages including:
Chemical etching: Etching part of the copper foil layer of the board by using a mask that indicates where traces should be left.
SMD Pick and Place: Putting the different components of the board (including integrated circuits, resistances, capacitors, etc) in their correct placements in the board.
Soldering: Adding a conductor layer over the different components to connect them to the PCB.
X-Ray Testing: Screening the PCB with X-Rays to validate the quality of the final product.
We will focus on the SMD Pick and Place stage of this process. For the sake of the sample, our factory has two areas dedicated to the SMD Pick and Place stage of different assembly lines: Areas A and B. Additionally, each of these areas requires multiple engines to operate including:
The engines used for the pick and place machines.
The engines used to move the conveyor belt of the PCBs.
To correctly operate, these engines require a reasonable temperature (below 80 degrees Celsius) to prevent overheating and consuming a specified range of power (between 40 and 80 Watts). To be able to monitor these parameters, the factory has installed smart sensors in the engines that measure in real time their temperature and power consumed. This information is submitted to the cloud constantly.
What are we going to show you?
Our goals are pretty simple:
To send an email to the customer in real time if the power consumed or temperature of an engine goes out of the safety range for a predefined amount of time.
To call an HTTP endpoint predefined by the customer to facilitate running a process in response to the alarm (web-hook).
To provide the customer the capability of monitoring alarm statistics in near real time in a dashboard.
To provide the customer the capability of calculating analytics over the historic record of alarms.
End to end architecture
As we already mentioned it, we implemented the solution using Azure and leveraging different resources to build the complete workflow. All these resources can be deployed on an Azure subscription.
The Sensor emulator is a locally-run application that emulates multiple engine sensors simultaneously. Each emulated sensor generates random samples of temperature, consumed power measurements and sends them to the IoT Hub every few seconds.
An Azure Stream Analytics Job is used to process the events received by the IoT Hub in real time. If the measurements of an engine are outside the safety range during a predefined time window, an Alarm Event is generated and sent to the Alarm Event Hub.
The alarms sent to the Alarm Event Hub will be processed by different systems:
a) An Azure Function that will dispatch to an external HTTP callback the received alarm & parameters. In a real scenario this ‘web-hook’ would send a request to an HTTP endpoint specified by an external subscriber interested on having a real-time monitoring solution on factory’s devices. In this reference implementation we just used a second Azure Function that just logs the alarm event to the console for the sake of the sample.
b) An Azure Logic App will send the alarm information to two different resources:
An Azure Monitor that will present the customer a dashboard with alarm statistics in near real time.
An email sent to the customer with detailed information regarding the alarm event.
c) The Alarm Event Hub will use Data Capture to save all the alarm event information to an Azure Blob storage. This information will be later processed by a Databricks notebook that generates CSV reports with historic alarm events analytics.
The sensor emulator console
To emulate the engine sensor measurements, the Sensor emulator application was written in NodeJS using TypeScript and the MostJS library. Each factory engine is registered as a different device in the IoT Hub and generates independent temperature and power consumed measurements every few seconds which are sent to the IoT Hub using the MQTT protocol.
To generate the random measurements in a realistic way, each sensor alternates between two different states:
Stable state: The sensor produces measurements within the safety range.
Alarm state: The sensor produces measurements outside the safety range.
The amount of time to switch between states is a random value selected within a pre-specified range. When operating in a specific state, the sensor generates measurements around a pre-specified value using a random-walk process that generates measurements that tend to go to the desired mean value.
The mean sample values and mean durations of each state can be defined on a per-sensor basis.
Below you can find examples of measurements generated by our algorithm:
As you can see, the sensor starts in stable mode with measurements in the range of 40–80 degrees Celsius and enters alarm mode at sample #17 when values start increasing until they stabilize around a mean value of 90. Afterwards the sensor goes back to stable mode by decreasing sample values.
The Azure stack comes to help
Let’s see now how we leveraged each of the different components shown at the architecture diagram above in detail.
How do we submit sensor’s data to the cloud?
Once sensors are ready, we have to connect them with the rest of the architecture (meaning being able to submit the data to the cloud). To solve that we used Azure IoT Hub, which can be used to establish a publisher/subscriber communication between the cloud and IoT devices.
This bidirectional communication supports:
Telemetry messages from the IoT device to the cloud.
Command messages from the cloud to the IoT device.
Reporting of current device status (e.g. current battery level).
Changing the device current configuration from the cloud.
It supports communication with the most common IoT protocols including:
MQTT and MQTT over WebSocket.
AMQP and AMQP over WebSocket.
In order for a device to be able to communicate with IoT Hub, it first needs to be registered in the IoT Hub and it needs to authenticate itself with its name as key as specified here.
For our setup we send messages to the IoT Hub from the Sensor emulator via MQTT as we mentioned above.
How do we analyze data being streaming in near-real time?
To solve this, an Azure Stream Analytics Job subscribes to events that arrive via the IoT Hub’s internal Event Hub. Azure Stream Analytics can be used for streaming pipeline processing of input events using SQL queries with very low latency. It can be configured to digest input data from a wide array of input types, process it with pre-specified jobs and push the results to the next stage in the pipeline.
In our setup, this job reads the measurements received by the IoT Hub and generates alarms when it detects that multiple consecutive samples are outside the pre-specified safety range. Finally it sends these alarms to the Alarm Event Hub.
The Azure Stream Analytics Job receives the measurements and generates two different types of events:
“Alarm on” events if a device had an alarm deactivated and three consecutive samples are outside the pre-specified safety range.
“Alarm off” events if a device had an alarm activated and three consecutive samples are inside the pre-specified safety range.
These events are sent to the Alarm Event Hub and are later consumed by the rest of the processing pipeline.
Below you can see the query flow:
Below you can see an example of alarms being sent by the Azure Stream Analytics Job:
How do we dispatch alarms to all interested parties?
Now, we need multiple parties (an Azure Function, an Azure Logic app and an Azure Blob storage container) to receive the same events (the Alarm events generated by the Stream Analytics Job) being generated by our platform. Therefore instead of the events having to be sent to all the receiver resources separately, they are sent to the Alarm Event Hub from which all the interested parties can read directly by subscribing to the Alarm Event Hub.
Azure Event Hub can be used for establishing communication between different Azure resources in the cloud. The advantages of using Azure Event Hub over other communication methods include:
Its compatibility with a wide array of different types of Azure resources.
Using a pub/sub model that allows multiple resources to subscribe to the event type without needing to configure the publisher to send this information individually to each recipient.
To store the events that arrive to the Event Hub in an Azure Storage container (necessary for calculating analytics on historic alarm events), the Data Capture functionality of the Event Hub is enabled.
I want an external party to be immediately aware of alarms
In our platform, the Azure Functions are triggered by alarm events received by the Event Hub. To tackle this need we deployed a function that receives the alarm event from the Event Hub service, gets the alarm information and sends an HTTP request to a pre-specified HTTP endpoint.
Azure Functions is a serverless compute service provided by Azure designed to execute event-triggered code without worrying about the application infrastructure. The Azure Functions can be triggered by a wide variety of events such as HTTP request, a scheduled time and events from other Azure services such as Event Hubs.
To test that the alarm event is sent correctly, we implemented a second Azure Function as the user’s HTTP callback endpoint. This second function simply logs the received events to validate that the request was sent/received correctly completing the flow.
In the image below we can see the telemetry message that arrives at the Event Hub and the alarm event logged by the ‘callback’ Azure Function.
I want to send an asynchronous heads up to external parties
Azure Logic App can be used to automate business processes and workflows easily and quickly. It is intended to integrate applications, systems, and services either in the cloud or on-premises. Logic Apps provide a lot of predefined ready-to-use connectors that allow building applications that can listen for events that occurs in other resources and trigger actions to process, store and/or send event information.
Within our platform we decided to send emails and provide a near real time dashboard with the alarms generated by the EventHub. Therefore, we created a Logic App which uses three different connectors:
A connector that is subscribed to the Event Hub alarm events sent by Azure Stream Analytics.
An Office 365 connector that sends the alarm events via email.
A Log Analytics connector that sends the alarm events to Log Analytics so they can be processed for later use in the near real time dashboard implemented with Azure Monitor.
The Logic App can be scheduled to read incoming events periodically. In our case, we configured it to retrieve alarm events once per minute.
Below you can see the Logic App application architecture:
And the email sent by it:
I want to have a dashboard where people can see alarms being raised
As we desire to monitor the solution and quickly check how many alarms we received in the last minutes, we choose these 2 services to solve that:
Azure Monitor can be used to monitor Azure’s resources and collect data in near real time. Azure Monitor integrates with other monitoring and analysis tools like Log Analytics and Application Insights, providing the flexibility to use this data from the cloud and local environments. Azure Monitor allows to build graphic metrics and show the log queries in the Azure Dashboards.
Log Analytics is a tool that aims to use queries to process incoming data.
We combined both services doing the following:
Implementing a Log Analytics query to process the received alarm events.
Creating a dashboard that uses the resulting information to generate two different charts.
a) One to show the Amount of alarms occurred in a predefined amount of time grouped by Factory Area.
b) Another one to show the Amount of alarms occurred by hour.
Below you can see the Azure Monitor’s dashboard with count of alarms received in the past 24 hours, grouped by factory area and also by time of day.
I want to have long-term statistics and insights over the historical alarm events
Databricks is derived from Apache Spark and allows to process large quantities of information organized in data frames using a computing cluster. This can be done transparently, abstracting the user from the architectures of the computing cluster and the storage.
It is possible to operate a Databricks cluster in two different modes:
Streaming processing: By connecting to a streaming data source (such as an Event Hub), it can receive large quantities of input information and process it in real time.
Batch processing: It can also read and process a finite amount of information from different types of storage, including an Azure Blob container.
Within this platform, we used Databricks batch processing to retrieve Alarm Events stored by the Alarm Event Hub in an Azure container (via the Data capture functionality mentioned before).
Our Databricks notebook connects to an Azure container and retrieves the Alarm Events stored by the Alarm Event Hub during a specified time range. For each pair of alarm on/alarm off events occurred for the same measurement type of the same device, it calculates the alarm duration by subtracting the events timestamps.
Afterwards it generates two different reports:
A report of the amount of alarms grouped by type and factory area and their average duration.
A report of the amount of alarms grouped by type and machine type and their average duration.
Using these reports, the customer can know the frequency and average duration of the alarms to be able to measure if a specific machine type or factory area has more frequent alarms than the rest and how the average amount of time it takes the factory operators to solve the issue affects productivity. Each of these reports is saved as a CSV file in an Azure Blob container for preservation.
Additionally, the Databricks notebook generates a plot showing the correlation between the amount of alarms and their duration for each device depending on the alarm type (temperature or power).
Below you can find a simple example of the data frames that are used to generate the CSV reports for four devices and the corresponding correlation plot.
During this article, we showed you how to build an IoT device monitoring system using the Azure Data platform. Our solution saves the event stream coming into the IoT devices into a data lake and raises alerts when out of bounds conditions are detected. The implementation was able to be done by using different Azure technology by implementing the ingestion of the IoT events and sending out alerts. Anyway, our approach could work as a base for any other monitoring system with IoT, it could be extended with different kind of sensors to control different elements of our scenario.
Although this goes beyond this reference implementation, in an additional stage you could bring into the picture machine learning models to discover patterns that might be hidden the failure of some components in the chain. Of course for this you would need to introduce other sensors in the ecosystem and correlate the information to be able to discover interesting insights that at first sight cannot be uncover. Different factors as the delivery of the power network in the area, ambient temperature, humidity and other variables might be affecting factory’s devices without being noticed.
Lastly you can see, by going into the details of the reference implementation shared in this repository, that nowadays you can build a near-real time monitoring solution for physical world devices super straight forward by just integrating several cloud managed services (this time using Azure).
Originally published by Mauro Krikorian for SOUTHWORKS on Medium 27 November 2020