Claranet | The Data Journey - Part Two: Streaming Analytics

November 30, 2020 Claranet Limited

Mike Fowler
PRINCIPAL DATA ENGINEER
CLARANET

Every business has data. Some organisations can spin gold from their data, but others merely collect it in the hopes of getting value someday. While there is no easy path to maximising the value of your data, there is at least a clear journey. In this second post of a four-part series, we'll look at how we can reduce the latency of our analysis. 

In our previous post, we looked at descriptive analytics, which is focused on the past and leaves you to draw conclusions based on hindsight. The trouble is your conclusions can be stale as they are based on old data. The latency of your analysis will be as long as it's been since your last processed batch - which could be hours or even days ago. While we can reduce the time between batches, we enter Zeno's Paradox; never arriving at our destination of the present moment. Streaming analytics allows us to analyse data as it arrives. 

Instead of processing all the data that has arrived since our last batch ran, we process each item of data as it happens. Typically, we introduce a pipeline that data is pushed to from our source systems. As the data flows through the pipe, we perform the desired analysis potentially enriching from other sources too. Once analysed, we can output the processed data to a real-time dashboard or maybe even to our data warehouse for further analysis. Pushing the data from the source systems can be a challenge. You may need to add hooks at key business lifecycle events to send the data. Many transactional databases allow you to capture change data that can be relayed with tools such as Debezium straight into Kafka.  

The value of near real-time processing comes from the fact you can make decisions based on what's happening in your business right now. A competitive video gaming platform company that I worked with used streaming analytics to react to online bullying in its chat forums. Rather than rely on moderators or complaints from users, they were able to react the moment a comment arrived, preventing the comment from appearing and warning the author of their unacceptable behaviour. 

There's a bit more to detecting online bullying, in fact a lot more than simply reading incoming text comments. This detector was actually a machine learned model that was interacted with as data arrived through the streaming pipeline. The combination of streaming analytics and machine learning inferences is very powerful and is one way to begin to explore the subject of our next post, predictive analytics. 

Previous Article
Claranet | The Data Journey - Part Three: Predictive Analytics
Claranet | The Data Journey - Part Three: Predictive Analytics

Every business has data. Some businesses can spin gold from their data but others merely collect it in the ...

Next Article
Claranet | The Data Journey - Part One: Descriptive Analytics
Claranet | The Data Journey - Part One: Descriptive Analytics

Every business has data. Some organisations can spin gold from their data, but others merely collect it in ...