"It’s a case of the cart going before the horse. Before you start implementing digital transformation in the form of artificial intelligence and machine learning – you first need to have the right data available in the right format."
- Trevor Bloch

What to do with all your data?

Most industrial businesses produce large volumes of time-series big data. Operational data comes in many different forms and can require different treatments to be useful.  Organisation’s also have many other sources of data, such as financial data, SCADA or ERP data.

As businesses become more astute to the value of their data, there is a push to start to use machine learning and advanced analytics to acquire valuable business insights from the available data. While this is a fantastic step towards digital transformation, there are a few issues that can limit the efficiencies and benefit of data science and machine learning.  

Many organisations, encounter similar issues:

• It can be difficult to locate all necessary data sets for analysis as it is stored in multiple systems throughout the organisation

• There are often gaps in the data, where it hasn’t been collected and stored accurately

• Multiple teams copy data sets and use them independently, resulting in duplicated data throughout the organisation 

• Organisations can’t identify when they’ve got data quality issues, when it is widely dispersed, the issues often only arise once it is being analysed.

If organisations uncover these issues when they commence machine learning, it is often the case of the cart going before the horse. Before you start implementing digital transformation in the form of artificial intelligence and machine learning – you first need to have the right data available in the right format.

What’s actually needed – is a big data storage solution.


Data Historian, Data Warehouse or Data Lake?

What is the difference between these common terms and what are the critical elements to consider when selecting a big data storage solution? 

Traditional Data Historian

Stores historical information about a process or manufacturing system, usually this data comes directly from PLCs, DCS or other process control systems, with some data being able to be entered manually.  This data can be used for condition monitoring, reporting, data visualisation and analytics. Traditional data historians tend to have a complex infrastructure, which means getting data out of the historian can become difficult, with multiple stakeholders involved. Due to licensing and volume restrictions many data historians aren’t set up to store all sensor data, and as the traditional historian does not store other organisational data it is not a complete repository of an organisation’s data information.

Data Warehouse

Has the capability to store big data, combining data from multiple, varied sources (including historians, or directly from PLCs, DCs systems, along with ERPs, CMMS, APM, DCS, ICSS, SCADA, financial systems, transactional systems) into one easily manipulated, comprehensive database. Becoming a complete repository of an organisation’s information. The data can be organised and analysed by visualization software to determine trends and easily reported on by end users.  

Data Lake

Stores structured, unstructured and semi structured data in an unorganised unclassified repository. The data is often not cleansed, deduplicated or corrected and can be hard to uncover business insights from without extensive time spent data wrangling. Data lakes have acquired the nickname ‘data swamp’ due to the unorganised nature of the data storage. Data Lakes are typically used by data scientists, rather than cross-organisational users.

To fully utilise an organisations data, it makes sense to consolidate all data sources into one scalable location. This increases the efficiency in which a company can access their complete data history, conduct analysis, and make business decisions. As such, we recommend a data warehouse as a complete repository of a company’s data. 


Difference between data historians, data warehouse, data lake and DataHUB4.0

How does an organisation benefit from a Data Warehouse?

• Data is gathered from multiple sources and preserved in a single database, preventing silos

• Organisations can have a centralised view of their data, despite having multiple systems in place for different departments

• The pre-processing of data, sorts and reduces duplicated data

• The transformation of semi-structured and unstructured data makes this data easy to use for stakeholders and improves speed of analysis

• If data is deleted from the original source, it is still maintained in the data warehouse, meaning that the data warehouse becomes the company’s single source of data truth  

• Data warehouses support advanced analytics, machine learning and artificial intelligence

• Data warehouses typically live in the cloud, which is less expensive than the management of on-site servers


DataHUB4.0 - Data Historian or Data Warehouse?

VROC’s platform DATAHUB4.0 combines the benefits of both a Data Historian (with its real time condition monitoring) and a Data Warehouse (single source of truth) in one.  DataHUB4.0 accepts big data from any source, which can be structured, unstructured and semi-structured (like a data lake). Rather than storing this data in an unorganised way, DataHUB4.0 transforms the unstructured data into structured data for warehousing, becoming your single source of truth.  It’s inbuilt analytics and visualisations tools mean no other software is required to perform analysis and build visualisation from your big data.  Meaning data insights just a couple of clicks away! 

Visualising your data?

Unlike other Data Warehouses, the tools for processing data are collocated with the data in DataHUB4.0, so there’s no requirement to repeatedly copy data to work with it. This gives the ability to deal with much larger data sets, without the need to wait for them to download to local machines or be limited by computing power.

As we can’t presume from the outset to know all the potential ways that the data can be used, the data is available for use with exploration and visualization tools which can query across all the data. With this approach, it’s even possible to join finance, production, and OH&S data to gain new and intriguing insights, possibly for ESG insights or the optimization of complex processes.

Condition-based monitoring for Operations?

Just like a traditional process or time-series data historian, DataHUB4.0 allows operational teams to monitor their processes and equipment in real-time.  Teams can set up dashboards to monitor status, plot trends and performance, which is refreshed as new data is ingested.  Teams can factor in known thresholds, and create band limits for operating envelopes on sensors, which can trigger alerts and predictive maintenance activities.  The added benefit from the traditional historian, is the additional data that exists in the warehouse which can be used in analysis and can save teams from logging into multiple platforms for information.  

Data Storage done. Next up Machine Learning…

Right, the ultimate objective! Advancing digital transformation and improving business outcomes with machine learning and artificial intelligence. After collating data from systems across the organisation into a scalable storage solution, data experts can use machine learning and AI tools to build models. Most data warehouses will integrate with data science tools.

DataHUB4.0 is the sister product of OPUS, which allows organisations to develop models and deploy them into production – operationalising them for real time insights.  Models are produced without any coding, and in the one interface, meaning a wider cohort can model their own problem statements, processes and business queries, no longer limiting this work to data science personnel. 

The benefits of AI can be numerous and diverse once there is a scalable and robust data storage solution in place. Getting this solution in place is the critical first step in a company’s digital transformation journey. 



useful resources

You might be interested in

Unlocking the Power of Time Series Data

Unlocking the power of time series data - How advanced analytics and AI are revolutionizing forecasting and prediction in industrial applications.

Read Article

City of the Future

Four ways IoT and AI can help shape a more innovative city for the future.

Read Article