Data-Driven Documents (D3) is a JavaScript library for building powerful graphics to communicate information in datasets. It is also fair to say that for many, myself included, it has a non-standard approach to building the graphics. Often the learning curve can feel steep.
In this post we’ll look at using NetworkX — a Python library for exploring graph structures — to do some of the initial data processing for us. Then we’ll add the artistic finishing touches in JavaScript with D3. The full code can be found on my GitHub and an interactive version can be found here.
The data
I remember when I was first introduced to the CIA World Factbook, and I loved it. It holds a treasure trove of information about all of the countries in the world. It is just screaming for visualisations of the data to be made. On top of this, it has been converted to different formats on GitHub and — most importantly for us — to JSON.
The data is given per country using their two character ISO encoding. We’ll need the continent each country is in to access the data. First we’ll create that dictionary:
import requests
from bs4 import BeautifulSoup
import json
#Accessing the Webpage
result=requests.get('https://github.com/factbook/factbook.json')
#Getting the content of the webpage
content = result.content
#Beautiful-souping it
bsoup = BeautifulSoup(content, 'html5lib')
#Getting the table data cells from the page with a specific class - (Use Chrome Dev Tools)
tds = bsoup.find_all('td', class_='content')
#An array of the possible continents
continents= []
for td in tds[1:14]:
continents.append(td.text.split('\n')[1].split(' ')[-1])
code_to_continent= {}
#Key-value pairs of country code to their respective continent
for continent in continents:
if continent!='meta': url='https://github.com/factbook/factbook.json/tree/master/{c}'.format(c=continent)
countries_page=requests.get(url)
content=countries_page.content
bsoup=BeautifulSoup(content, 'html5lib')
tds=bsoup.find_all('td', class_='content')
for td in tds[1:]:
code_to_continent[td.text.split('\n')[1].split(' ')
[-1].split('.')[0]] =continent
The dictionary makes things a lot simpler when we want to access the URL for each country’s data.
The next step is to define a simple Country class to hold the data. While we’re at it, it would improve the visualisation if we could use actual country names — not their two character code — so we can find that information and store it for later use.
#A simple country class to store the data relating to the country
class Country:
def __init__(self, name, code):
self.name = name.lower()
self.code = code
#The country being exported to and the percentage
self.exporting_partners = {}
def add_partner(self, p, v):
self.exporting_partners[p] = v
# Different possible names pointing to one country code - we might have 'USA': 'us' and 'United States': 'us' etc.
names_to_code = {}
for code in code_to_continent:
url = 'https://raw.githubusercontent.com/factbook/factbook.json/master/{continent}/{code}.json'.format(continent=code_to_continent[code], code=code)
names_to_code[code] = code
json = requests.get(url).json()
if 'Country name' in json['Government']:
if 'conventional short form' in json['Government']['Country
name']:
names_to_code[json['Government']['Country name']['conventional
short form']['text'].lower()] = code
if 'conventional long form' in json['Government']['Country
name']:
names_to_code[json['Government']['Country name']
['conventional long form']['text'].lower()] = code
And now we’re finally ready to add the exporter information — this method isn’t perfect but it gets a majority of the information.
Don’t worry too much about the split() functions on the exporting partners. That’s just cleaning up some of the data so we only get the names and the percentages we want. Check out the GitHub page to see the extra names I had to add for the graph construction to work.
countries= {}
#For all of the country codes
for code in code_to_continent:
#Get the JSON Data
url='https://raw.githubusercontent.com/factbook/factbook.json/master/{continent}/{code}.json'.format(continent=code_to_continent[code], code=code)
json=requests.get(url).json()
if 'Country name'injson['Government']:
#We might miss a few if they don't have this
if 'conventional short form'in json['Government']['Country
name']:
name=json['Government']['Country name']['conventional
short form']['text']
if 'Exports - partners'injson['Economy']:
#Get the exporters
partners=json['Economy']['Exports - partners']
['text'].split(',')
country=Country(name, code)
for partner in partners:
#Process the data to drop some bits we don't need
and get the values we want
p=partner.split()
if p[-1][0] =='(':
country.add_partner(' '.join(p[0:-2]).lower(),
float(p[-2].split('%')[0]))
elif p[-1][0] =='e':
country.add_partner(' '.join(p[0:-3]).lower(),
float(p[-3].split('%')[0]))
else:
country.add_partner(' '.join(p[0:-1]).lower(),
float(p[-1].split('%')[0]))
countries[code] =country
NetworkX
NetworkX is a fairly sophisticated Python library for constructing, analysing, and — to a certain extent — exporting graph data structures. It is also really simple to use.
Now that we have the data we want stored in our Country objects, the code for creating our directed graph is very simple.
We can also add attributes to our nodes like the degree of the node and the name (not the ISO code). Once we have our data structure we can export it to a JSON format and dump it in a file ready to be used with D3.
import networkx as nx
# Create the empty directed graph
G = nx.DiGraph()
# Add the edges with a weight proportional to their percentage expressed as a decimal
# For consistency we're using the ISO-Codes
for country in countries:
for partner in countries[country].exporting_partners:
G.add_edge(names_to_code[countries[country].name],
names_to_code[partner], weight =
countries[country].exporting_partners[partner]/100)
# So we can visualise some cooler things later on we can also add some attributes to the nodes
degrees = nx.degree(G)
# The non-ISO code name
names= {}
for country in countries:
names[country] =countries[country].name
# Number of exporting partners
ds= {}
for name, dindegrees: ds[name] =d
nx.set_node_attributes(G, ds, 'degree')
nx.set_node_attributes(G, names, 'name')
# And finally export it to a JSON format compatible with D3
from networkx.readwrite import json_graph
data = json_graph.node_link_data(G)
# And then dump it in a file
with open('graph.json', 'w') as fp:
json.dump(data, fp)
What’s different about D3?
From the outset, Mike Bostock (the founder of D3) wanted to create a “reusable chart.” In his post on the subject, he highlights the key goals and missions for the D3 project. These can help us understand the syntactic structure it has.
The first, and most important, take-away is that charts should be implementable “as closures with getter-setter methods.” If you’re new to programming you may be confused as to what a closure is. Don’t worry! The big concept associated with closures is lexical scoping, which sounds a lot scarier than it really is. The basic idea behind it all is nested functions and how the inner function has access to the outer function’s variables.
Take a look at EXAMPLE 1 in the code below. Here we simply return the inner function, which has access to the arguments passed to the outer function. The variable we declare, closureOne , is a function and when we execute it with closureOne() we console.log(config.name) .
//EXAMPLE 1: A Closure - A function within a function (nested)
function closureOne(config){
returnfunction(){
console.log(config.name);
}
}
config={
'name': 'Joe Blogs'
}
var closureOne = closureOne(config);
closureOne();// 'Joe Blogs'
//EXAMPLE 2: A slightly more complicated D3-esque closure
function closureTwo(){
var nameOfPerson = 'Bob Smith';
function my(){
// Some Code Here;
}
my.fullName = function(value){
if(!arguments.length) return nameOfPerson;
nameOfPerson=value;
return my;
}
return my;
}
var closureTwo=closureTwo();
closureTwo.fullName('Alice Smith');
console.log(closureTwo.fullName())// 'Alice Smith';
console.log(closureTwo.nameOfPerson)// Undefined;
Closures Examples
In EXAMPLE 2, we declare variables within the scope of the outer function allowing the inner my function to have access to it. In the fullName function associated with the my function — a method — we can either set or get the nameOfPerson depending on it any arguments are passed. Notice how the developer does not have access to the variable nameOfPerson . The developer is forced to use our defined methods to update and access it, providing a level of security to our function.
This method of using closures is how D3 is coded. Take a look at the line function to see this in action. This video may also shed some light on the topic.
Programming in D3
Thankfully you don’t need to be a master of closures and D3 to create a visualisation. In fact, so long as you can copy and paste, you can usually get something running in no time at all thanks to Mike Bostock’s Blocks. This site has many open-source examples of building data visualisations in D3. You can use the code to create your own with your data.
To create the network for this tutorial, I used this example. To get it running on screen, all I had to change was the name of the .csv file.
Let’s look at some of the key lines of code making this visualisation work, and also the parts I’ve added to hopefully improve it further.
Just before we dive in, Andrew Dunkman wrote a great article helping explain D3 selection which I recommend reading.
// D3.select allows for DOM manipulation
// Here we get the SVG in our index.html and then get it's width and height
var svg = d3.select("svg"),
width = svg.attr("width"),
height = svg.attr("height");
// Here we create a the hover over 'tooltip' - note it is an element of the body and not the SVG
var div = d3.select("body")
.append("div")
.attr("class","tooltip")
.style("opacity",0);
// the forceSimulation method starts a new force simulation to which we then add forces
var simulation = d3.forceSimulation()
//d3.forceLink applies the force between source and target nodes
.force("link",d3.forceLink().id(function(d){return d.id;}))
//d3.forceManyBody acts like 'electrical charge' and keeps the
nodes separated
.force("charge",d3.forceManyBody())
//d3.forceCenter moves the position of the centre to the middle of
our SVG instead of the top left corner
.force("center",d3.forceCenter(width/2,height/2));
//d3.json(file, callback) - once we have the data we execute the code in the callback
d3.json("graph.json", function(error,graph){
if(error) throw error;
//Here we create the links in the graph - we first append a 'g' or a group to the SVG
var link = svg.append("g")
.attr("class","links")
.selectAll("line")
//Don't worry that there are no lines - when we use the
.enter() method we can create them
.data(graph.links)
//Our d3.json graph has the link information
.enter().append("line")
// For each of these we want to create a line
.attr("stroke-width", function(d)
{
return Math.sqrt(d.value);
});
// With some width function associated with the value of
the data point
var node=svg.append("g")
.attr("class","nodes")
.selectAll("circle")
.data(graph.nodes)
.enter().append("circle")
.attr("r",scaledSize).attr("fill", function(d)
{
return color(d.degree);
})
// All of the above should seem familiar from the 'links' code
above - what's different is the extra functionality we add
.on("mouseover",mouseOver(.7))
//two new functions for mousing over and mousing out which we
will look at
.on("mouseout",mouseOut)
.call(d3.drag()
// Call performed on the entire selection - d3.drag does
exactly what you would expect - again we'll
// take a look at the functions passed in.
.on("start",dragstarted)
.on("drag",dragged)
.on("end",dragended));
// Supplying the simulation with the nodes and telling it to perform the ticked function per tick
simulation
.nodes(graph.nodes)
.on("tick",ticked);
simulation
.force("link").links(graph.links);
//... more functions - take a look at the full code to see them});
To see the functions we’re calling like dragstarted , scaledSize , or mouseOut , be sure to look at the full code here. As an example, let’s see what’s happening when we click on a node.
//We first perform a selectAll on the circles of the SVG (i.e. the nodes of our graph)
//And if a node is clicked, we perform the function supplying that nodes values as an argument
svg.selectAll("circle").on("click", function(d){
//We first create a string for the country's name of the node clicked
let name = "Country: "+d.name.toUpperCase();
//We then set up the exporters string
let string = "Exporters: ";
//From our predefined object containing key-value pairs of linked
nodes we check to see what our node is linked to
//To make the linkedByIndex object we iterated over the links of
the graph and placed them in it
Object.keys(linkedByIndex[d.index]).forEach(key=>{
for(n of graph.nodes){
//Sometimes we might not have parsed the data perfectly so we
need to do some sanity checks
if(n.index==key&&n.name!=undefined){
//If everything is code we add it to the string
string+=(n.name.toUpperCase()+", ");
}
}
});
//And then manipulate the DOM using some vanilla JS
let country = document.getElementById("country");
country.innerHTML = name;
let exporters = document.getElementById("exporters");
exporters.innerHTML = string;
});
Conclusion
The code is messy, the visualisation isn’t perfect, and there is so much left to discuss and learn.
But that’s beside the point. Hopefully this post has been able to let you get your feet wet with NetworkX and D3 without throwing too much at you. We all have to start somewhere, and this can be your beginning to create insightful and powerful data visualisations.
If you’re stuck wondering what to tackle next, here are a few of suggestions:
Mike Bostock’s Towards Reusable Charts — this is a great example of someone explaining their thought process and then their implementation. This shows how his goals for the project affected his implementation.
D3 and React — two libraries fighting over the DOM, this is currently what I’m reading about, and seeing what are the best ways to utilise both on a project.
Elijah Meeks, Senior Data Visualization Engineer at Netflix —any of Elijah Meeks’ posts are a great resource and often shed light on the world of data visualisation.
Source: Medium - Patrick Ferris
The Tech Platform
Comments