Article

Building a Unified Analytics Framework to Unlock Analytics at Enterprise Scale

Most medium to large scale manufacturers are in crisis--you may be among them--they cannot scale their analytics efforts to embrace today’s technologies, let alone tomorrow’s. To compete in the new manufacturing landscape, you must unleash the full potential of your data.

This in-depth exploration presents a compelling case for centralized integration, offering transformative benefits for your organization. From simplifying analytics to enabling data accessibility across the board, the journey to achieving scalable data analytics starts here.

Explore how Flow Software can elevate your business through enhanced governance, contextualized information, vendor independence, and simplified operations, all designed to add value quickly and effectively.

The Roadblocks To Scaling Analytics Across The Enterprise

The transformation of historical and real-time data into calculated tags, KPIs, and events at an enterprise scale has been successfully executed by very few companies, and currently, most are still struggling to create a strategy that will fix the problem at hand. The crux of the matter is how to execute the analytics process, not sporadically, but consistently and at full capacity across the entire enterprise.

Initiating this dialogue, it's crucial to ask a fundamental question. What is the most crucial data source for analytics? Is it the real-time or current values? Or perhaps it lies in historical records? Could it be manual data? While all three elements play pivotal roles, analytics are fundamentally constructed on the bedrock of historical records. Despite this importance, no advancement has been made to simplify and centralize connectivity to these vital historical databases.

The conundrum lies in the prospect of making a leap from foundational analytical methods that have been in place for the past four decades to an Industry 4.0 style of analytics. Traditional Industry 3.0 approaches to analytics simply cannot be repackaged or repurposed to fit the new demands. This is akin to merely enhancing the exterior of an outdated framework and labeling it as something revolutionary. The issue at hand prompts a critical question: How to break away from the established modus operandi? It necessitates first identifying the current operational pattern. A closer look at how analytics efforts have been compartmentalized offers insight.

Essentially, with every new business use case, an application is identified based on the Industry 3.0 approach. This selection often stems from a Subject Matter Expert's (SMEs) preference, be it Power BI, OSI software, AWS, or even Excel. Within the chosen application, customized integration work is undertaken step-by-step, starting from connecting necessary data sources, ingesting and standardizing data, cleansing it, adding minimal context, performing calculations and aggregations, validating results, and eventually extracting value. This has been the standard procedure for four decades.

Stovepiped Integration and Architectures

While it can be operationally rewarding, transitioning from raw data to valuable information is an expensive step involving data integration, often accounting for up to 80% of the project cost. Although value is derived, it creates an isolated vertical system or "stovepipe", where all integration work is confined to a single application. Nevertheless, since there is a tangible return, the cycle repeats with every new business use case, often spreading the work across multiple applications within an enterprise.

Creating these stovepipes has significant drawbacks. They inhibit the creation of a single source of truth for both data access and integration work, stranding prior integration efforts within individual applications. Subject matter experts become segmented, limiting collaboration, and data governance becomes increasingly difficult due to diverse business rules and definitions across different applications. The biggest challenge, however, is scalability. While this approach can function for one-off solutions, it falters when applied globally across numerous sites.

An examination of a typical plant illustrates the point further. The meeting room walls of such a plant often carry a variety of reports and KPIs printed daily or handwritten on whiteboards. Each are emblematic of numerous stovepipes, or vertical integrations within applications, causing valuable project work to be lost over time. The situation becomes even more convoluted when trying to extend this form of reporting or analytics to an enterprise level. It typically leads to an attempt to link plant data ecosystem stovepipes and results in new stovepipes within enterprise data ecosystems. This method is fundamentally unscalable. Scaling analytics will remain an elusive goal as long as stovepipes and data silos continue to be created within organizations.

Hitting The Analytic Ceiling

Analytical approaches so far have been largely descriptive and diagnostic, focusing on understanding past occurrences and their causative factors. This has been possible through trend analysis and expert examination of multiple data points, supplemented by event records, offering clarity about operational happenings.

However, a shift is currently underway towards predictive analytics, which forecasts future possibilities. For example, a machine learning algorithm analyzing data from a piece of equipment may predict with a 92 percent certainty that it will fail within the next 48 hours. Though this methodology is prevalent in a few small-scale proofs of concept, its widespread application across manufacturing sectors remains limited.

The zenith of this analytical progression is prescriptive analytics, which combines predictive forecasts with an organization's objectives to suggest the most beneficial course of action. For instance, if a predictive model determines a 92 percent likelihood of a machine's failure within three days, prescriptive analytics will consider multiple variables such as the product being manufactured, customer relationships, inventory of spare parts, availability of service personnel, and the plant's schedule. Based on these parameters, it suggests an action plan that aligns with the organization's priorities.

Let's take the case where the product currently in production is for a top customer, previous commitments have been missed, the spare part and the expert to replace it are unavailable, and scheduled downtime is seven days away. The prescriptive model might suggest reducing the run rate by 14% to meet production goals and maintain customer satisfaction. This reduction will lower the likelihood of failure to 34%, allowing for part replacement during the scheduled downtime.

Such advanced analytics require the amalgamation of a significant amount of data from dozens of sources. To harness its value, it's crucial to cleanse, normalize, and find the inherent value within this data.

Video series - Unrivaled Data Analytics: Why most data analytics efforts continue to fail
Are you serious about delivering trustworthy data to key decision makers? If so, we guarantee these short videos will supercharge your analytics.

The Library Approach

Consider the approach akin to the construction of a library, such as the Trinity College Library. This revered institution has spent centuries collating works from across the UK. Notably, they didn't fragment their efforts by creating separate libraries or teams. Instead, they centralized their endeavors, fostering collaboration, and adopting a unified approach to amass works across the UK.

Applying this philosophy to the world of data integration, one can perceive the redundancy in duplicating work and the lost opportunities in not centralizing the efforts. Instead of isolating processes into individual "stovepipes" and repeating the same tasks, there's value in consolidating these operations. This involves aligning various data producers and consumers with a common, centralized library for integration work. This Unified Analytic Framework approach can effectively scale analytics, and this is precisely the solution offered by Flow Software.

The scale benefits are manifold: every new project benefits from previous ones, and the work done today lays the foundation for subsequent projects. This approach supports robust data and engineering governance, with business rules located in a single place, eliminating redundancy and enhancing enforcement. With each interaction with the data, the added context does not get lost in an application, but rather is centralized, fostering collaboration among subject matter experts.

Furthermore, it frees organizations from having to rely on a single historian or standardize a single SCADA application. It decouples analytics from the historian, SCADA, or even the application layer for data visualization. This means organizations can integrate more applications, increasing their flexibility and adaptability to future changes.

Building a Unified Analytic Framework with Flow Software

Delving into the solution for enterprise scalability, it's crucial to first understand Flow Software's mission and capabilities. Flow Software is dedicated to empowering teams with enhanced decision-making capabilities, focusing on scalable analytics infrastructure. Flow isn't merely an analytics application; it's a path to robust analytics foundations. Its primary objective is to aid organizations in constructing Industry 4.0 analytics frameworks.

Flow Software thrives in the intersection of Operational Technology (OT) and Information Technology (IT) - the convergence point that is imperative for building solid analytics. With over a decade of product development and six major releases, Flow Software's maturity and experience is unparalleled.

The company has had the privilege of collaborating with some of the world's most data-savvy organizations, including Coca Cola, AB InBev, PPC, Unilever, Heineken, Vulcan Materials, Kellogg's, and Philip Morris, among others. This experience has provided invaluable learning opportunities, and Flow Software is eager to share these insights and guide more organizations towards developing scalable analytic frameworks.

Exploring the functionality of Flow Software in uniting analytics through a structured framework, it's important to comprehend the core components that make this feasible. These components can be divided into three groups, which correspond to the three-step process implemented by Flow Software to enhance your analytics.

Whitepaper Download - 'What is the Unified Analytics Framework?'

Step One: Implementing Data and Engineering Governance

Asserting corporate control over both data and engineering governance is pivotal for businesses operating in a data-driven landscape. This dual control not only ensures data integrity and provides a structured framework for decision-making, but also enforces a single set of business rules that everyone follows, eliminating the need for repetitive replication. Such uniformity simplifies processes, reduces the risk of errors, enhances transparency, and drives operational efficiency, thereby paving the way for sustainable business growth.

Flow Software employs a Common Information Model that establishes robust governance for engineering tasks and internal data handling. It is complemented by a set of tools designed to generate highly contextualized information, and another to distribute this information across the enterprise to individuals and other applications. Delving deeper into the concept of data and engineering governance reveals Flow Software's reliance on a Common Information Model.

Historically, efforts were made to standardize tag naming across an entire enterprise. Although feasible for organizations with one or two sites, this approach quickly becomes expensive and challenging to implement as the number of sites increases. This challenge is further compounded by growing organizations that frequently acquire new assets. Consequently, the traditional practice of limiting application variety, such as restricting to a single brand of historian or SCADA, is no longer viable.

Flow Software addresses these issues by enabling the creation of corporately managed templates. These templates serve as a centralized library to define your process, machinery, and equipment, allowing the creation of attributes, calculations, KPIs, and event definitions that are deployed from the central library to actual data projects and sites.

When a new business use case is established and a data project created, Flow can be used to redirect the necessary integration work from silos and stovepipes to a more centralized approach. One of the many introductory use cases of Flow could be for production counts or material usage and would start with defining a line in a production process, associating it with various components like filters, tanks, and valves. Both fixed and dynamic properties are defined at this stage. Fixed properties may include the associated area, equipment numbers, line name, operator, manager, or site. Dynamic properties often resemble tags in a SCADA system or an HMI, with variables like inventory produced, water usage, electricity usage, raw materials, and product codes captured.

Based on these dynamic properties, calculated properties can be created to measure efficiency and performance metrics like water used per bottle produced or electricity consumed per unit. Flow allows monitoring of events such as product changes and downtime, and collects attributes around the duration of these events. This capability allows for easy expansion to OEE calculations at the same time.

With these established, KPIs can be defined, usually as aggregated numbers of dynamic or fixed properties across a period or event duration. Examples might include total throughput, percent change, or product totals by hour, but are certainly not limited to these basic examples. In fact, some customers have more than 200,000 KPIs defined and executed within Flow. The beauty of Flow’s approach lies in its ability to build and manage these KPIs in the corporate managed library, facilitating excellent data governance. With time, these models expand as more data analytics projects are undertaken, resulting in an ever-growing and adaptable definition of equipment, processes, machines, and sites. All previous projects are able to inherit the work of later projects, and vice versa.

The traditional approach of trying to govern or evolve non-centralized business rules and integration work is no longer effective. The need of the hour is a centralized workspace like that offered by Flow, enabling efficient management and expansion of projects.

Step Two: Enriching Data with Meaningful Context

The second step in the process pertains to the enhancement of raw data into a highly contextualized information repository. This enhancement eliminates the need for arduous point-to-point integration and prevents the loss of crucial work within the application layer.

Flow Software provides three distinct tools to facilitate this process: data source connectors, calendar management, and a robust calculation engine. Traditionally, data from various sources lacked interrelation, which often led to tedious data wrangling and valuable time spent by data scientists and analysts trying to provide context. This time-consuming process sometimes took months, if not years, to build analytical models, often with unsatisfactory results.

Flow Software introduces context to data at an earlier stage, involving operational technology (OT) professionals in adding context as part of their normal daily tasks as they interact with Flow. Crucially, it acknowledges that the value of analytics lies not merely in real-time data but mainly context-rich historical data. This recognition requires historic data be made as easily available through a centralized data hub as real time data is made available within an MQTT broker, a task that Flow is up to. Flow enables enterprise teams to work with contextualized data, as opposed to raw, uncontextualized information. This is possible due to the data source connectors, calendar management, and calculation engine of Flow Software.

For data source connectors, Flow Software can interface with a wide array of historical sources, including SQL based historians, specialized NoSQL historians like Aveva PI or Canary, older proprietary databases, as well as new options like InfluxDB and Timescale. Flow also seamlessly connects to relational or transactional data sources offering connectors into every variety of SQL database including PostgreSQL, MYSQL, MSSQL, and Oracle. Additionally, Flow picks up real-time data from MQTT brokers, OPC Servers, Kafka streams, and Web APIs. Flow even provides web forms for manual data collection, making it ideal for categorizing downtime, setting production goals, capturing uninstrumented equipment measurements, and correcting faulty values.

In terms of calendar management, Flow Software accommodates the unique operational timelines of each site, whether it begins its production day on a Monday at 6 a.m. or Sunday at midnight. It offers custom calendar creation that define shift patterns, financial years, production years, and even the ability to manage peculiar one-offs in OT spaces. This makes site-to-site comparisons across shifts, production days and weeks straightforward, without losing each site's unique characteristics.

Finally, the calculation engine of Flow Software brings all these elements together. Once the data sources are connected to the instantiated instances of the Common Information Model, pushed from the corporate side, time periods have been established, the calculation engine performs all associated calculations without any complications. It excels in data standardization, time normalization, data cleansing according to set rules, and slicing data by the correct time or event period.

As an illustration, consider Flow Software connecting to data from SQL table and a tag within a historian database. Flow unites both data sources, creating a new calculated tag from two differently structured databases without losing any values. This process allows all raw data to be effectively cleansed based on the rules defined within the Common Information Model and further contextualized by the site’s own requirements.

Event start and end triggers are evaluated, and event periods are created. Flow then aggregates the cleansed data and slices it into new KPI measures based on either time intervals like minute, hour, shift, day, week, month or year, or by event duration.

One of Flow Software's unique strengths is its ability to handle late data rules and events within historian data that arrive after calculations have been run, which are common issues that break most applications in this field. But with Flow's mature product, which is state-aware and understands that data does not always arrive on schedule, recalculations are made automatically and new calculations are appropriately versioned.

To conclude, sharing context using traditional architecture is virtually impossible. Silos of information cannot be sustained, and the cross-pollination of context is critical. Therefore, centralizing analytical efforts is the only viable way to provide early context to data scientists and analysts, and this is precisely what Flow Software aims to achieve.

Step Three: Facilitating The Open Sharing Of Information

The crucial final area of focus is ensuring seamless information sharing across an organization, granting access to the appropriate individuals and applications. Flow Software offers five robust tools designed to achieve this goal, transforming the way data is managed and utilized.

Recently, replicating entire databases has become the fad, all as part of an attemnpt to gather raw data into a single data lake or warehouse so analyst and teams could spend months or years trying to add context and order. This process has included massive amounts of time series data held in both site and enterprise historians. Flow proposes a more efficient method. Rather than relocating data, Flow leaves raw data in validated databases, such as historians or SQL databases, and instead serves as a data hub between these data sources and data consumers, delivering raw data on demand. This creates a fluid pipeline, ensuring data arrives where and when needed, providing a reliable single source of truth for all data queries and synchronization.

Flow's approach begins with a KPI and event database, which stores outputs from the calculation engine for measures and events in a SQL database. This eliminates the need for replicating the entire historical record. Instead, the historian data is processed as required, based on the time periods of the KPIs, and the outputs are stored in the KPI and event database.

The notification engine then disseminates this information. Crucially, this information is integrated into existing team applications, such as Slack, email, text, or Microsoft Teams. This prevents the need for additional application learning and integration.

Furthermore, Flow dashboards offer a superior data visualization tool, designed with a focus on KPIs, reports, data tables, downtime event tracking, and manual contextualization. These dashboards simplify data sharing as PDFs or images and ensure comprehensive data table printing. It is also equipped to export data in CSV format or even create comment strings within dashboards.

A distinguishing feature of Flow dashboards is the ability to interrogate data. If any aspect of the data presented on a Flow dashboard raises questions, users have the ability to delve deeper into the data, examining the underlying definitions and sources. For instance, clicking on a questionable KPI, such as a negative throughput percentage, will display the underlying definition and calculation of that KPI as defined in the Common Information Model profile, along with its actual number variables. This validation capability extends even further, allowing the user to explore the source of individual numbers, such as daily production, and how it contributes to the KPI.

The bottommost layer of information reveals how data is aggregated from the historian, passing both the tag path and raw data to the Flow dashboard, ensuring the user's confidence in the data's accuracy. All of this is achieved within a single dashboard, connecting to multiple data sources, eliminating the need to understand the underlying data infrastructure. If Flow is aware of it, so is the user.

This interrogative capacity is made possible through Flow's API, the backbone of all functionality within Flow. The API is publicly exposed, license-free, allowing the entire team to request data from Flow. This includes not just KPIs and events, but also raw data linked to profiles, the Common Information Model, and their definitions.

Flow Software's data sink feature provides a robust solution for publishing data on a trigger or schedule, extending far beyond the capabilities of standard APIs. This functionality allows for the insertion of information into SQL databases, publication to Kafka streams and MQTT brokers, and batch data delivery to data lakes, cloud tools, and various AI and machine learning tools.

This not only provides data teams with well-curated, ready-to-use data but also allows every aspect within Flow to be accessible - profiles, definitions, raw data from any connected tag, calculated tags used as filters or cleansing tools, and measures. The enterprise-wide scalability of Flow permits site-specific handling of individual context, defining time patterns, naming shifts, and operational differences. All this information is readily available to teams via a connected, tiered Flow instance.

Both the data sinks and APIs function across all deployed Flow instances, both at site as well as at enterprise, allowing access to any data linked to the enterprise's Flow library. This streamlined access requires no prior knowledge of underlying site data sources; if it's linked to Flow's models, it is accessible, simplifying the data acquisition process for both raw data as well as contextualized rich information.

Strategy For Success

It is imperative to recognize the untenability of continuing to address analytics through stovepiped integration work within the application layer. Once this has been acknowledged, it signals the time shift all future data projects to a centralized integration effort; scaling analytics for current and future success.

Reaping the benefits of this architectural shift becomes evident with every project undertaken. The organization can look forward to enhanced data and engineering governance, immediate access to highly contextualized information, seamless collaboration among subject matter experts, and independence from the constraints of single-vendor solutions for historian SCADA and analytics applications.  

The question then arises: how do you transition data project integration to this new centralized approach?  The answer lies in aligning the stakeholders at both the plant and enterprise level and adding value equally to both.

For site-based stakeholders, such as plant managers and process engineers, the benefits materialize in the consolidation of their KPIs into one convenient location. Reports, perhaps a decade old, fragmented and difficult to interpret, can now be integrated into one coherent system. This simplification process provides teams with a single, interrogable source for information, streamlining the addition and coding of data.

From an enterprise perspective, the executive team appreciates this approach for its facilitation of governance over business roles, engineering work, and information. The centralized, templatized deployment of equipment and process profiles considerably secures the entire operation.

Data scientists and analysts find this solution particularly appealing as it provides a single source of truth for both raw data as well as contextualized, rich information. By implementing this strategy, an organization can successfully transform its data management and analytical practices, realizing long-term benefits and sustained growth.

If Not Now, When?

Flow Software recognizes that progress is driven by exceptional teamwork and is eager to embark on this transformative journey alongside you. If this strategy strikes a chord, consider who else within your organization might benefit from these insights. This article can be freely shared, or alternatively, a live session can be scheduled by reaching out to Flow Software directly.

After ensuring the necessary stakeholders are onboard, the next step is a live demonstration of Flow Software. This provides a firsthand look at how the software can be utilized to develop the reports and projects previously mentioned.

Ready to Discuss Your Analytics Project?

You might also like

Leverage OEE and APQ, The Right Way!
Leverage OEE and APQ, The Right Way!

Manufacturing leaders and plant managers often rely on metrics like OEE and APQ to gauge operational efficiency and identify areas for improvement. While these metrics are powerful, their effectiveness depends heavily on how they are applied. Incorrect calculation or over-reliance on OEE as a standalone KPI can lead to misleading insights.

January 16, 2025
Read
Unlocking Operational Excellence in  Colocation Data Centers
Unlocking Operational Excellence in Colocation Data Centers

Colocation data center providers face unique challenges in managing complex operations while meeting stringent reporting requirements for their customers. Providing accurate, timely, and standardized reports across multiple areas or sites is not only critical for customer satisfaction but also essential for compliance and operational efficiency.

January 14, 2025
Read
Expanding the Unified Namespace to Include Historical Data Access and Governed Data Transformations
Expanding the Unified Namespace to Include Historical Data Access and Governed Data Transformations

The Unified Namespace (UNS) enables real-time data flow and simplicity, but lacks historical data access and transformation capabilities. The Unified Analytics Framework (UAF) complements UNS, providing a platform for governed data transformation, historical insights, and seamless integration with real-time architectures.

November 26, 2024
Read