Community Stories: Sankey Diagram

How To Follow Data Flow In Real-time

By Tanja Zinchenko

Probably every company has an Innovation Week every now and then. Our company is no exception. During our last Innovation Week we implemented Sankey Diagram - one of the good ways to visualize our work and make our life a bit easier. I would like to share this experience with you. It will be useful for you if you are also struggling to find a tool that is good enough to show how great your work is. Or if you are simply a big fan of graphs and charts.

A Little Bit Of History

During the last year, I have been working for a company that is a payment provider (PSP). Basically, we are doing one thing: make payments possible. But this one thing is like an iceberg: there is way more hidden under water than is visible above.. It is supposed to be easy: I go to the supermarket to do groceries, I pay by card, and my money goes to the seller's account. But it is a little bit more complicated in real life. Money doesn't go to a merchant's account right away. First, they go through us. We provide a lot of checking, like checking for fraud, and only after this, we send money forward.

Our CTO is a huge fan of Spotify, that's why we use their system in our work. We have a couple of development centers in different countries and all of them contain tribes. Each tribe is responsible for its own part of the system. The one I work in is responsible for the backend of transactions. In a nutshell, we are taking transactions, checking them, comparing them to different rules and giving away the ones that passed the whole process.

Our project is a bunch of microservices connected to each other, but do not connect to any GUI. Our best shot in Quarter Reviews is to show a screenshot from RabbitMQ or the result of a really big SQL query, which returns us a couple of variables. Our visibility was very very low because of that. And whilst other tribes were showing big colorful interfaces with pretty buttons, the only thing we could do was to talk about the number of transactions our system can manage per day. But - yet - nothing to show. And everybody knows how managers like pretty and colorful pictures.

We tried to solve this issue by sending daily reports: how many transactions we received per day, how many per hour, what were the results of checking. Once again - it was only a table inside an email. Although we added some colors: green for success, yellow - for not that much. It solved the issue only partly because it takes only 30 seconds to look through the email. And we wanted to give our manager something to play with for a longer period of time.

We tried to give them access to different dashboards like Kibana, but the idea failed completely.. Then our Architect went on a business trip to our colleagues in Spain. And found the simplest, yet genius thing. They took the World map (we have offices in different countries) and made the following visualization: every time an "our" transaction happens somewhere the highlighted point appears at that spot. And flies towards HQ in Amsterdam. And all of this on a big screen in the middle of the office.

The thing is - we cannot follow all transactions online. We get all data all at once at a later date. But even if we assume this is possible, the map would be one big highlighted spot, because the amount of transactions is huge. So technically this project didn't make much sense. But all our managers were extremely happy to see this. So innovative and interactive!  That's how our Architect came up with the idea of Sankey Diagram.

What Is Sankey Diagram?

In a nutshell, it is a flow diagram. The width of each band depends on the amount of data in it. Classic usage of this diagram today is energy bands on factories, where they are divided on production, utilization and loss. Usually, sources are gas, electricity or oil. It was perfect for us because:

  • We also have flows with Correlated, Unrecoverable or Denied transactions
  • We also have sources from which we get those transactions
  • We have around 10 services, which are part of the whole process and can be crosspoints
  • We have an endpoint

Interesting fact: the idea of the Sankey Diagram is not new. It has been used in production from 1898 when Irish captain Mattew Sankey drew the effectiveness of steam generators in black and white. It looked a bit too simple back then but was colored afterwards by some kind people. Thankfully.

The Diagram is mostly used in science, mostly physics, but there is nothing which can prevent us from using scientists’ work in Internet Technologies.


 

How To Implement Sankey Diagram

I have already explained that our system has no exit to the UI world. It is backend only. That's why for the Sankey Diagram we had to create a web page from scratch.

You may or may not have heard about the Data-Driven Documents (d3.js) library for JavaScript, which allows you to create different charts and graphs. D3 uses SVG, Canvas and HTML.

It is used almost everywhere now. For example, New York Times uses this library to create its own interactive stories. If you have never heard of it, here is a simple tutorial for beginners


 

 

There is a special plugin which can be implemented in d3.js, it's called d3-sankey. The plugin has 2 options for creating a diagram: "Step-By-Step" or "Selection Of Blocks". We decided to choose the second one.

Elastic provided us with data for our diagram. We implemented it whilst trying to make managers use dashboards, such as Kibana. The idea failed completely. Our managers did not like it at all. They had issues using dashboards and always complained about unfriendly interfaces. But personally I am really happy with Kibana. It has data from all our services. Navigation is extremely easy, so I have absolutely no issues going through 5 environments with more than 10 services for each.

Data for our partners is sent via RabbitMQ, so we included it also. Sankey Diagram shows the size of the queues from Rabbit, including error queues.

As a result, the Diagram was created. By default, it shows flows for the last 24 hours divided by hourly snapshots. We also added a slider for navigation through snapshots and made transitions animated.

Finally, we got something like this (the picture was downloaded, our own data is under NDA :( )

As a tester, I use this diagram daily. It turned out this is an easy way to see if something went wrong. For example, if one flow is unexpectedly wider than it was before - this is the sign of an error and we need to investigate what went wrong.

As a QA Lead, I make presentations for Sprint Reviews and I always use screenshots from Sankey Diagram for it. It is nice, colorful and shows a lot.

Outcomes

Basically, this Diagram shows the same data which is shown in our daily emails: all transactions for the latest 24 hours divided by hour. But Sankey looks pretty and colorful. Because of light animation, it looks like data is changing in real-time. The slider allows you to see older data.

We wanted to create this Diagram to increase our visibility. As a side effect, we have gotten a good tool for quick testing. Can I say "smoke testing"? I think I can. Every morning I check if Sankey shows me good data or burns.

I hope it was helpful.

Have fun with your own Sankey Diagram!

Author Bio:

Tanja Zinchenko, Test Architect with almost 15 years experience in testing. Lives in the Netherlands. Co-organizer of QA Amsterdam meetup, co-owner of 5 (five) cats.

https://www.linkedin.com/in/tzinchenko/