Empty Pipes



Graph Proportions and Label Sizes

  • 18 Oct 2015
  • |
  • python
  • matplotlib
  • research
  • |

The example below shows how to create a graph that is both aesthetically pleasing and sensibly proportioned. I used these parameters to generate the majority of the figures in my PhD thesis. The savefig function can also be used to save a pdf or an svg file for modification in your favorite image editor (e.g. inkscape).



The code to generate this plot uses the seaborn library in an IPython notebook:

%load_ext autoreload
%autoreload 2
%pylab inline

import numpy as np
import seaborn as sns

rc('text', usetex=True)
plt.rc('font', family='Palatino')
sns.set_style('white')
sns.set_context("notebook", font_scale=1.0, rc={"lines.linewidth": 2})
rc('text', usetex=True)    # use latex in the labels
pylab.rcParams['figure.figsize'] = (1.3,1.0)


font = {'family' : 'serif',
        'serif': 'Palatino',
        'weight' : 'bold',
        'size'   : 11}

matplotlib.rc('font', **font)

# create the dummy data
x = np.linspace(0, 2*math.pi,100)
y = sin(x)
z = cos(x)

# plot that sucker
fig, ax = plt.subplots()
ax.plot(x, y, label='sin')
ax.plot(x, z, label='cos')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.axvline(x=3, color='red', ls='dashed')
ax.set_title('Trigonometric Functions', y=1.08)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

pyplot.locator_params(nbins=7)
for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] +
         ax.get_xticklabels() + ax.get_yticklabels()):
    item.set_fontsize(10)

handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, bbox_to_anchor=(1.8, 0.85))

plt.savefig('img/trigonometric_functions.png', dpi=500, bbox_inches="tight")

Layouts Upon Layouts

  • 11 Oct 2015
  • |
  • javascript
  • d3.js
  • rna
  • |

One of the truly beautiful things about d3.js is how easy it is to combine multiple layouts into one graphic. The example below incorporates three differents layout to show the data along 3 different dimensions:

  1. RNA secondary structure. RNA is a molecule similar to DNA with the property that it folds back onto itself to form pairs with bases in its own sequence. Its secondary structure is often displayed as an eponymous diagram. The circles representing nucleotides are arranged using an rnaplot layout.

  2. Some quantity associated with each molecule (e.g. concentration). We can display this by scaling the secondary structure diagrams using a treemap layout. Each rnaplot is scaled as a treemap

  3. Multiples of the above quantities (e.g. from different experiments). Four sets of structures and sizes are arranged using a grid layout.



So how do we create this layout chaining? We start with the d3-grid layout:

var rnaTreemap = rnaTreemapChart()
var rectGrid = d3.layout.grid()

var rectData = rectGrid(root)
var gMain = d3.select(divName)
    .append('svg')
    .data(rectData)
    .enter()
    .append('g')
    ... //position the g according to rectData
    .call(rnaTreemap);

Now we have one appropriately positioned <g> for each treemap, under which we will construct the rna-treemap layout:

// the rna-treemap layout
var chart = function(selection) {
    selection.each(function(data) {
        var treemap = d3.layout.treemap()

        d3.select(this)
        .append('g'); //probably unnecessary
        .datum(data)
        .selectAll(".treemapNode")
        .data(treemap.nodes)
        .enter()
        .append('g')
        .call(rnaTreemapNode);

        function rnaTreemapNode(selection) {
            selection.each(function(d) {
                var chart = rnaPlot()

                if ('structure' in d) d3.select(this).call(chart)
            });
        }
    });

Finally, the rna-plot layout adds its own <g> and continuous on drawing the circles associated with the RNA (not shown).

function chart(selection) {
    selection.each(function(data) {
        rg = new RNAGraph(data.sequence, data.structure, 
                          data.name)

        d3.select(this)
        .append('g')
        ...

And that’s it! Create layout function. Create child nodes bound to the data. Call layout function. Rinse, repeat! Take a look at the “Towards Reusable Charts” tutorial for an excellent introduction to creating a custom layout.


Addendum


Here’s the beginning of the json file used to create the plot:

[{"name": "graph", 
    "children": [{"structure": "((..((....)).(((....))).))",
        "sequence": "CGCUUCAUAUAAUCCUAAUGACCUAU",
        "size": 50},
        {"structure": "((...........(((....))).))",
            "sequence": "CGCUUCAUAUAAUCCUAAUGACCUAU",
            "size": 40},
            {"structure": "..........................",
                "sequence": "CGCUUCAUAUAAUCCUAAUGACCUAU",
                "size": 20}
    ]
},
{"name": "graph", 
    "children": [{"structure": "...........(((((.....)))))..",
        "sequence": "CGCUUCAUAUAAUCCUAAUGACCUAU",
        "size": 50},
        {"structure": "(((((((......)))).))).......",
            "sequence": "CGCUUCAUAUAAUCCUAAUGACCUAU",
            "size": 10}
    ]
},
...]

Wikipedia's Climate Data on an Interactive Map

  • 02 Sep 2015
  • |
  • maps
  • wikipedia
  • d3.js
  • leaflet
  • |

Introduction

One of my favorite things about Wikipedia is that most cities have a ‘weather box’ which shows historical climate data such as sunshine hours, maximum and minimum temperatures, precipitation and various other interesting statistics:



It’s fun to compare the values for different cities. Are summers in Vienna warmer than in Zürich (yes)? Is Seattle rainier than New York City (no!)? What are the sunniest regions in the world? What about the rainiest? This often involves jumping from page to page or opening up two browser windows to compare values. Couldn’t we make it easier? What if we could see all the values for every place for which there was data at once? What if we could show how the global weather changes over the course of the year?


Sunshine Precipitation Daily High

The animations above show how the world’s climate changes over the year, as documented in Wikipedia’s weather boxes. Sunshine mostly follows a predictable pattern following the seasons, bright in the northern hemisphere from April to October and vice versa. A few exceptions stick out, such as the prominently cloudier regions over the equatorial land masses, which largely correspond to the rainforests of the Amazon, Mid-Western Africa and Indonesia, Malaysia, and Papua New Guinea.

These rainy regions are easily recognized in the middle animation above, which show how the precipitation changes over the year. As expected the rainiest regions are where we find rainforests near the equator, as well as along the coast of British Columbia and northern Washington in the US. A few rainy islands in the Pacific and South Atlantic are shown with disproportionately large areas due to the lack of any other weather stations nearby (see the description of the map below).

The temperature map is also as expected, wherein the temperatures follow the seasons. Most striking, perhaps, is how much the temperatures change over the large landmasses of North America and Siberia, as compared to the oceanic regions. The astutue eye may also notice persistently colder temperatures over Tibet, Mongolia and Central China due to their high elevation.

These animations were created by recording interactions with the map described below.

The Map

The map below contains a Voronoi diagram overlay where each cell is color coded according to the climate data for the location defining that point (default is sunshine). Moving the mouse over any cell will show the city it corresponds to as well as its climate data.

A time range can be selected using the circular control on the bottom right corner (only works on the desktop version). The letters refer to the months of the year. Dragging one of the handles will extend or contract the range, whereas dragging on the range itself will translate it.

Different climate data overlays can be selected via the icon in the upper right corner.



How It's Made

Data Preparation

  1. Wikipedia dumps for all the pages are downloaded.
  2. For each article that has an associated location and weatherbox, I extract the name, latitude, longitude and weatherbox data and store it in a JSON file.
  3. This file is filtered for for any entries that don’t have sun, precipitation, high and low temperatures
  4. The final file is used as input to climate-map.js
  5. All of the code for parsing wikipedia is on github

Interactive Map

  1. Country outlines were obtained from Johan Sundström’s world.geo.json github repository
  2. The circular brush was obtained from Elijah Meeks’ bl.ock
  3. The map itself is displayed using leaflet.js.
  4. There’s a bottom layer using CartoDB’s Positron Layer, although this is usually covered by up the SVG containing the voronoi diagram.
  5. There’s a middle layer containing the SVG element with all of the voronoi cells.
  6. There’s a label-only CartoDB Position Layer.
  7. Finally, on top of that, the circular brush is used to create the month selector control on the bottom right corner.
  8. The layer selector control on the upper right hand corner is a hacked facsimile of Leaflet.js’s Layers Control. It’s hacked because the different layers aren’t actually Leaflet layers, but rather different cross sections of the data. Selecting different options triggers a different data bind for the Voronoi cells in the SVG layer.

Related

A similar map is available for historical temperatures at halftone.co.