Data Visualization

An interactive visualization of supply and use tables with Sankey diagrams

Contents
1. Introduction
2. Project Structure
2.1 Required Technologies
3. Basic Understanding of D3 and Sankey Diagram
3.1 D3 (Data Driven Documents)
3.2 Sankey Diagram
4. Functional Flow of Circular Sankey
4.1 Loading and Parsing Data
4.2 A close look on SVG elements: Rectangles and Cubic Bézier Curves
4.2.1 How Rectangle and Cubic Bézier Curves work together
4.2.2 Coordinate transformation of Rectangles and Curves
4.3 Saving Data Into Database
4.4 Exporting Diagram as SVG
5. Managing User History
6. Conclusion
7. References

1. Introduction

In recent days where there is huge amount of scientific data is available and needs to present them in a interactive way, data visualization became a important part of software application especially in web. This implemented circular sankey generates graphical diagram consists of different types of graphical shape (mainly rectangles and curve lines) where rectangles indicate states and curve lines indicate the flow between the states of a specific system.


2. Project Structure



2.1 Required Technologies

Circular Sankey is build on the open source tool D3.js and its Sankey library. We created a seperate Sankey library to handle the circular property (described later) with the help of available Sankey library. In some cases JQuery and JavaScript has been used.
As a server side technology, I have used ASP.Net 4.5 and C# as a server-side language. I have used Mysql database as database platform to store user history and login authorization.
I used Microsoft Visual Studio 2015 as IDE and MySQL Workbench 6.3 as database IDE.


3. Basic Understanding of D3 and Sankey Diagram


3.1 D3 (Data Driven Documents)

Data-Driven Documents (or D3) is a JavaScript API library for producing dynamic, interactive data visualization for the web. D3 enables direct inspection and manipulation of the standard document object model (DOM). Technically, D3.js is a very thin layer on top of modern web technologies to build powerful data visualization with the effective use of SVG, JSON, JQuery, HTML5 and CSS. It uses pre-built JavaScript functions to select elements, create SVG objects, style them, add transitions or dynamic effects (according to user's requirement) to them, which can easily be decorated with CSS.
Large and complex datasets can be easily bound after parsing into various formats (mostly JSON and CSV) to SVG objects using simple D3.js functions to generate rich text/graphic charts and diagrams (see more).

3.2 Sankey Diagram

(From wikipedia) Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity.
The illustration shows a Sankey diagram which represents all the primary energy flows into a factory. The widths of the bands are directly proportional to energy production, utilization and losses. The primary energy sources are gas, electricity and coal/oil and represent energy inputs at the left hand side of the Sankey diagram . They can also visualize the energy accounts, material flow accounts on a regional or national level, and also the breakdown of cost of item or services.

Lets look at the bellow's example (see the source)

This Sankey diagram for an electric lamp shows that most of the electrical energy is transferred as heat rather than light.

Energy can be transferred usefully, stored or dissipated. It cannot be created or destroyed. Notice that 100 J of electrical energy is supplied to the lamp. Of this, 10 J is transferred to the surroundings as light energy. The remainder, 90 J (100 J – 10 J) is transferred to the surroundings as heat energy.

Lets see another image which is generated using an available online tool (Sankey Matic).

                        Wages [2000] Budget
                        Interest [25] Budget
                        Budget [500] Taxes
                        Budget [450] Housing
                        Budget [310] Food 
                        Budget [205] Transportation
                        Budget [400] Health Care
                        Budget [160] Other Necessities

In this particular diagram, we have 9 individual states/nodes drawn as rectangles and among these nodes we have 8 data flows/links which is drawn as curves. Its also needed to mention that these graphical shapes or attributes are basically svg elements that we used here (we will mainly use svg rectangles and svg cubic bézier curves).

4. Functional Flow of Circular Sankey

Basically our circular sankey will follow the bellow's steps to complete its working process.
1. Loading data.
2. Binding data to elements.
3. Transforming elements depending on the bound data.
4. Employ transitioning onto elements according to user interaction.


4.1 Loading and Parsing Data


Sankey library does not have the luxary to work on some randome data if its not well structured. Sankey works directly on JSON file format or csv format

.JSON
{
"nodes":[
{"name":"Barry"},
{"name":"Frodo"},
{"name":"Elvis"},
{"name":"Sarah"},
{"name":"Alice"}
],
"links":[
{"source":"Barry","target":"Elvis","value":2},
{"source":"Frodo","target":"Elvis","value":2},
{"source":"Frodo","target":"Sarah","value":2},
{"source":"Barry","target":"Alice","value":2},
{"source":"Elvis","target":"Sarah","value":2},
{"source":"Elvis","target":"Alice","value":2},
{"source":"Sarah","target":"Alice","value":4}
]}

.csv
               source,target,value
               Barry,Elvis,2
               Frodo,Elvis,2
               Frodo,Sarah,2
               Barry,Alice,2
               Elvis,Sarah,2
               Elvis,Alice,2
               Sarah,Alice,4


However, We tried to add more control over nodes and flows property by the users. For this reason our input data format will be more complex. Instead of taking only [name_of_source_node, name_of_target_node, value] properties, we are taking the important information of both nodes [name_of_node] [color] [orientation] [width] [height] [x_position] [y_position] and flows [name_of_source_node] [value] [flow color] [name_of_target_node] properties from the user. This is the one of the main features that we introduced in our circular sankey tool over the existing sankey tools. Now if one is become interested to see how sankey library works based on only three [name_of_source_node, name_of_target_node, value] information, I would highly recommend you to read this article (Sankey Diagrams: A Description of the d3.js Code).

We have two main way to take data input.
1. By uploading data file in .csv or .xlsx or .txt format.
The upload file option take a file from user specified directory and parse the data into our target data format which we use for generating sankey diagram. Data parsing is always very complex as there are varieties of data format is available and therefore effective data parsing to target data format is essential.
2. Using specified input text box direcly.

Data format for Nodes:
Name	         Color	  Orientation  Width   Height  X_position Y_position
Agriculture	(0, 255, 0)	0	40	80	150	250
Waste water	(255, 0, 0)	20	40	100	380	50
Industry	(222, 83, 36)	90	30	80	800	200
Coal Mining	(167, 98, 36)	180	40	80	650	450
Methane	        (94, 193, 36)	180	40	80	500	650
Nitrous Oxide	(0, 153, 51)	180	40	80	300	550

Data format for flows between Nodes:
Source	       Value	     Color	Target
Agriculture	12	(0, 191, 255)	Waste water
Waste water	10	(255, 0, 255)	Industry
Industry	6	(0, 64, 255)	Coal Mining
Coal Mining	11	(153, 0, 115)	Methane
Methane	        12	(0, 153, 0)	Nitrous Oxide
Nitrous Oxide	12	(255, 0, 0)	Agriculture


4.2 A close look on SVG elements: Rectangles and Cubic Bézier Curves

After parsing data into our target format, our next approach is to create the required svg elements. Let's look at another simplified example figure- (source)
As we can see, we have three svg attributes here in this figure. The rectangles (nodes), flow curves (links) between rectangles and text (title of nodes). I am not going to describe how these attributes are done by code as these building blocks are already available ( Sankey Diagrams:A Description of the d3.js Code). Rather, I would describe the distinct features of our circular sankey which we believe is unique. To support the idea, we are introducing the circular behavior of Sankey Diagram. As we can see in the picture the orientation of each rectangles is 0 degree or in another sense the rectangles are straight and the overall information flow of the diagram is flowing from left to right. So, we can say the source is in left side and target is in right side. What if we want something more interactive? For example- what if we want a cycle, where data will be flown a circular way rather than just staright left to right direction (see figure bellow-). To achieve this idea into reality, our first step should be applying transformation onto the rectangles; in other words- we have to apply the orientation value (instead of default 0 degree we will ensure the rectangles are rotating according to user defined angle value ranging (0<=orientation_value>=360)) for rectangles and other related adjustments (we will see more in section 4.2.2). Before that, we will have a look how individual rectangles and curves are created and how they are inter-connected (section 4.2.1):

4.2.1 How Rectangle and Cubic Bézier Curves work together

1
2
3
<svg width="50" height="50">
    <rect x="0" y="0" width="50" height="50" fill="green" />
</svg>


1
2
3
4
5
6
7
8
9
    var svgContainer = d3.select("body").append("svg")
        .attr("width", 200)
        .attr("height", 200);
    var rectangle = svgContainer.append("rect")
        .attr("x", 10)
        .attr("y", 10)
        .attr("width", 50)
        .attr("height", 100);

Above two code segments are the very basic of making SVG rectangle. The codes are pretty straight forward and don't need more explanation. The next segment of code is important to understand how Cubic Bézier Curve works.

1
2
3
<svg width="190" height="160">
   <path d="M100,250 C100,100 400,100 400,250" " stroke="black" fill="transparent"/>
</svg>

The initial M directive moves the pen to the first point (100,250). Following the ‘C’ are three co-ordinates; the start control point (100,100), the end control point (400,100) and the final point we’re drawing to (400,250).

For Circular sankey, we have this block of code for generating rectangle.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
 var node = svg.append("g").selectAll(".node")
           .data(json_data.nodes)          // using node properties
           .enter().append("g")
           .attr("class", "node")          // use CSS named node
           .attr("transform", function (d) {
                                    return "translate(" + d.x + "," + d.y + ")";        //allocate the position of X_coordinate and Y_coordinate from given node data
           })
           .call(d3.behavior.drag()
           .origin(function (d) { return d; })
           .on("dragstart", function () { this.parentNode.appendChild(this); })
           .on("drag", dragmove));

node.append("rect")
           .attr("width", function (d) {
               node_width_counter++;
                                    return a_node_width[node_width_counter];         // manually set width from given node data
           })
           .attr("height", function (d) {
                                    return d.dy;                             // manually set height from given node data
           })
           .attr("transform", function (d) {
                                    return "rotate(" + d.angle + ")"         // manually set rotation value to each rectangle from given node data
           })
           .attr("shape-rendering", "crispEdges")
           .style("fill", function (d) {
               node_color_counter++;
                                    return node_color[node_color_counter];           // manually set node color from given node data
           })
           .style("fill-opacity", function (d) {
                                    return d.opacity || data_set.default_node_opacity;
           })
           .style("stroke-width", data_set.node_border || 0)
           .style("stroke", function (d) {
                                    return node_color[node_color_counter];
           })
                .append("title")
           .text(
                                    function (d) {
                                    return data_set.show_labels
                                    ? d.name + "\n" + units_format(d.value)
                                    : "";
               });

The original sankey library sets all these rectangle properties automatically, that means user has absolutely no control. This is sometimes useful when someone tries to avoid complexity or doesn't care about setting own interest manually. On the other hand, this is very useful and meaningful if someone wants more interactive diagram by imposing own interest and deal with the circular property that we discussed above. These manual node properties should give much flexible nodes matching with user interest but its yet to be achieved taking care of some other changes (we will come to this point).

Now lets have look how links/flows/Curves between the nodes/rectangles are formed.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
var link = svg.append("g").selectAll(".link")
            .data(json_data.links)
            .enter()
            .append("path")
            .attr("class", "link")
            .attr("d", path)
            .style("fill", "none")
            .style("stroke-width", function (d) { return Math.max(1, (d.dy)); })
            .style("stroke", function (d) {
                                    return d.color ? d.color
                                    : d.source.inherit_right ? d.source.color
                                    : d.target.inherit_left ? d.target.color
                                    : data_set.default_flow_inherit === "source" ? d.source.color
                                    : data_set.default_flow_inherit === "target" ? d.target.color
                                    : data_set.default_flow_color;
            })
            .style("stroke-opacity", function (d) {
                                    return d.opacity || data_set.default_flow_opacity;
            })
            .on('mouseover', function (d) {
                d3.select(this).style("stroke-opacity",
                    d.opacity_on_hover
                                    || ((Number(data_set.default_flow_opacity) + 1) / 2));
            })
            .on('mouseout', function (d) {
                d3.select(this).style("stroke-opacity",
                    d.opacity || data_set.default_flow_opacity);
            })
            .sort(function (a, b) { return b.dy - a.dy; });

This block of code responsible for creating the Cubic Bézier Curves. This is also quite straight forward. From given link data it generate the Cubic Bézier Curve. We will more into this code specially lets see how the link data is formed.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sankey.link = function () {
                                    var curvature = .5;
                                    function link(d) {
                                    var x0 = d.source.x + d.source.dx,          // d.source.x is the x position of source node and d.source.dx is the width source node
                x1 = d.target.x,                        // d.target.x is the x position of target node
                xi = d3.interpolateNumber(x0, x1),      // interpolate between x0 and x1
                x2 = xi(curvature),                     // curvature is the property which controls how much curvy the link will be (value ranges from 0-.9)
                x3 = xi(1 - curvature),
                y0 = d.source.y + d.sy + d.dy / 2,      // d.sy is calculated based on link widh and how many links are associated with each source node and d.dy is the height of node
                y1 = d.target.y + d.ty + d.dy / 2;      // d.ty is calculated based on link widh and how many links are associated with each target node
                                    return "M" + x0 + "," + y0
                                    + "C" + x2 + "," + y0
                                    + " " + x3 + "," + y1
                                    + " " + x1 + "," + y1;
        }
        link.curvature = function (_) {
                                    if (!arguments.length) return curvature;
            curvature = +_;
                                    return link;
        };
                                    return link;
    };
So we can see how the coordinate values are calculated. From two node position (source and target), this link function use some mathematical terms and finally return the values which are needed to draw these Cubic Bézier Curves (see more in section 4.2).

Here comes our main challenge. We have our node properties and we also get our link properties from above function right away (you could say from the very first block of code of sankey script). And we already know link (Cubic Bézier Curve) coordinates are based on the source and target node position. Later when we draw the rectangles using the user give node data we also use the rotation value to rotate the rectangle. Once we rotate the rectangle, the x and y coordinate position of source and target rectangle changes. We already know that the Cubic Bézier Curve properties (staring point, first control point, second control point and ending point) are directly calculated from source and target node/rectangle position. As we are rotating the nodes, we have to change the link function also to re-calculate link properties based on the new source and target node position. To adjust this change we have to understand the coordinate transformation.


4.2.2 Coordinate transformation of Rectangles and Curves

While working with straight rectangle (with 0 degree orientation), we saw our link function basically work on every coordinate points of nodes. Therefore, we need to know what will be new coordinate points of every node once we rotate with specified angle (let's say θ). In the figure bellow we can see how the coordinate points are changing after imposing θ. So, initial point (X2',Y2') will be transformed into new point (X2, Y2) using the formula-
X2=X1 + (width * cosθ)
Y2=Y1 + (width * sinθ)
where width is the corresponding node width and θ is the corresponding node orientation value. Both values are user defined and can be found in node data. Based on this coordinate transformation idea we can re-calculate all the new points of Cubic Bézier Curves in link function.

Let's have a look into our new link function.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
sankey.link = function () {
                                    function link(d) {
                                    var x0 = d.source.x + d.source.dx,          // d.source.x is the x position of source node and d.source.dx is the width source node
                x1 = d.target.x,                        // d.target.x is the x position of target node
                xi = d3.interpolateNumber(x0, x1),      // interpolate between x0 and x1
                y0 = d.source.y + d.sy + d.dy / 2,      // d.sy is calculated based on link widh and how many links are associated with each source node and d.dy is the height of node
                y1 = d.target.y + d.ty + d.dy / 2,      // d.ty is calculated based on link widh and how many links are associated with each target node
                x_2 = d.source.x + (d.source.dx * Math.cos((Math.PI / 180) * d.source.angle)),      // x2_source = d.source.x + (node_width * cosθ)
                y_2 = d.source.y + (d.source.dx * Math.sin((Math.PI / 180) * d.source.angle)),      // y2_source = d.source.y + (node_width * sinθ)
                x0_r = x_2 - ((d.sy + d.dy / 2) * Math.sin((Math.PI / 180) * d.source.angle)),      // final x0_r from x2_source with the value of d.sy and d.dy
                y0_r = y_2 + ((d.sy + d.dy / 2) * Math.cos((Math.PI / 180) * d.source.angle)),      // final y0_r from x2_source with the value of d.sy and d.dy
                x_2_t = d.target.x + (Math.cos((Math.PI / 180) * d.target.angle)),                  // x_2_t_target = d.target.x + (cosθ)
                y_2_t = d.target.y + (Math.sin((Math.PI / 180) * d.target.angle)),                  // y_2_t_target = d.target.y + (sinθ)
                x1_r = x_2_t - ((d.ty + d.dy / 2) * Math.sin((Math.PI / 180) * d.target.angle)),    // final x1_r from x_2_t_target with the value of d.ty and d.dy
                y1_r = y_2_t + ((d.ty + d.dy / 2) * Math.cos((Math.PI / 180) * d.target.angle)),    // final y1_r from y_2_t_target with the value of d.ty and d.dy
                distance = Math.sqrt((x0 - x1) * (x0 - x1) + (y0 - y1) * (y0 - y1))                 // Euclidien distance between source and target
                x2 = x0_r + Math.cos((Math.PI / 180) * d.source.angle) * distance * curvature,          // x position for source control point 
                y0_r_r = y0_r + Math.sin((Math.PI / 180) * d.source.angle) * distance * curvature,      // Y position for source control point 
                x3 = x1_r + Math.cos((Math.PI / 180) * (d.target.angle + 180)) * distance * curvature,      // x position for target control point 
                y1_r_r = y1_r + Math.sin((Math.PI / 180) * (d.target.angle + 180)) * distance * curvature;      // y position for target control point 
                                    return "M" + x0_r + "," + y0_r
                                    + "C" + x2 + "," + y0_r_r
                                    + " " + x3 + "," + y1_r_r
                                    + " " + x1_r + "," + y1_r;
            }
        link.curvature = function (_) {
                                    if (!arguments.length) return curvature;
            curvature = +_;
                                    return link;
        };
                                    return link;
    };


4.3 Saving Data Into Database

Circular Sankey offers the option of storing data. An unique job id is created every time user wants to save the data into database. There is a datagrid which shows all the job id's that specific user created so for. User can easily view the meta data and the actual diagram data. It is also possible to navigate and recreate the exactly same diagram with exactly same values that user left last time. So, we can say no user history will be lost.


4.4 Exporting Diagram as SVG

After successful completion of the diagram user can download the diagram as svg format. Other image format could be an option which we might focus in future.


5. Managing User History

Circular Sankey also have a portal for user maintenance page which is only accessible by admin type of users. An admin can fill the registration fields and create a new user. There is also other necessary measurements which ensure that every user is unique and no user will be able to view or access the data or history of other users.



6. Conclusion

A graphical representation of some abstract idea is always much meaningful and easy to understand. When we have some complex statistical data, the idea of representing it into some kind of pictorial form could be very useful. This is the point of interest for the people who like to work with the technology of data visualization. Sankey diagram is widely used graphical way to describe statistical data of industrial and material flow. The number of available online tool of creating sankey diagram from raw data is still not enough or not good enough to deal with increasing number of user's expectations. Moreover, they are often found unable to handle exceptions. Introducing circular behavior was one of these main exceptions or better say user expectation which has been discussed recently. We also found some other areas which could also be a point of interest in near future. A much closer look into D3.js will essentially increase the performance of circular sankey and meet the emerging need of experts in coming days.