- Practical Industrial Internet of Things Security
- Sravani Bhattacharjee
- 627字
- 2021-07-23 18:56:07
Industrial big data pipeline and architectures
Data is the prime asset in the IIoT value chain. Industrial devices such as sensors, actuators, and controllers generate state and operational data. The information inherent in this industrial big data enables a variety of descriptive, prescriptive, and predictive applications and business insights. This end-to-end flow of data, from the point of ingestion, through information processing using various extract, transform and load (ETL) functions, applying AI and machine learning intelligence, up to the point of data visualization and business application, is collectively referred to as the industrial big data pipeline (shown in Figure 2.7):
The preceding diagram is explained as follows:
- On-premise data sources: On-premise data includes usage and activity data – both real-time streaming data (data in motion) and historical/batch data from various data sources. Sensors and controllers embedded in remote sites or plant floors generate big data. This data reflects sensed parameters, controller action, and feedback signal data; from which we can gain granular visibility into real systems. This raw data can be both structured and unstructured, and can be stored in data lakes for future processing or streamed for (near) real-time stream analytics. Data at rest is stored in transient or persistent data stores and includes historical sensor data, fault and maintenance data reflecting device health, and event logs. This data is sent upstream to canonical data stores in platforms, either on-premise or in the cloud, for batch processing.
- Data ingestion: Event processing hubs are designed to ingest high data rates and send the data for real-time analytics. In the case of batch data, canonical data stores and computing clusters such as Hadoop/HDFS, Hive, SQL, and so on perform ETL functions and may direct the data to machine learning applications.
- Data preparation and analytics: In this stage, feature engineering and ETL can be performed on the data to prepare it for analytics.
- Stream analytics: It provides real-time insights based on the sensor data, for example, the device health of a steam turbine. The data can be stored here in long-term storage for more complex, compute-intensive batch analytics. The data can be transformed for consumption by machine learning applications that can predict, for example, the remaining useful life of the steam turbine.
- Data visualization: Enterprise-tier applications such as customer relationship management (CRM), enterprise resource planning (ERP), and so on consume the data. Business intelligence (BI) analytics software such as Tableau, Pentaho, and so on can be used to develop data visualization applications to gain a variety of BI insights (for example, performance, remaining useful life, and so on) or create alerts and notifications based on anomalies.
The exact implementation of the big data pipeline and data flows can vary based on specific data governance and data ownership models. The end-to-end pipeline can be fully owned by the industrial organization (for example, a smart windmill) or can leverage private or public cloud infrastructures to leverage application and business domain efficiencies.
In the cases where the assets are dispersed and remote, for example, turbine engines in a wind farm and oil rigs in an oil field, data processing and computational capability may be needed at or near the assets for local analytics and control. This process is further elaborated in the subsequent sections of this chapter.
From an IIoT system trustworthiness perspective, each element of the big data pipeline needs to be designed by integrating data privacy, reliability, and confidentiality controls; and at the same time keeping in view safety, availability, and resilience implications.
Practical mechanisms to integrate security controls such as secure transport, storage and updates, security monitoring, and so on across this industrial data pipeline and data flows are discussed in the subsequent chapters.