Empty Pipes



Wikipedia's Climate Data on an Interactive Map

  • 02 Sep 2015
  • |
  • maps
  • wikipedia
  • d3.js
  • leaflet
  • |

Introduction

One of my favorite things about Wikipedia is that most cities have a ‘weather box’ which shows historical climate data such as sunshine hours, maximum and minimum temperatures, precipitation and various other interesting statistics:



It’s fun to compare the values for different cities. Are summers in Vienna warmer than in Zürich (yes)? Is Seattle rainier than New York City (no!)? What are the sunniest regions in the world? What about the rainiest? This often involves jumping from page to page or opening up two browser windows to compare values. Couldn’t we make it easier? What if we could see all the values for every place for which there was data at once? What if we could show how the global weather changes over the course of the year?


Sunshine Precipitation Daily High

The animations above show how the world’s climate changes over the year, as documented in Wikipedia’s weather boxes. Sunshine mostly follows a predictable pattern following the seasons, bright in the northern hemisphere from April to October and vice versa. A few exceptions stick out, such as the prominently cloudier regions over the equatorial land masses, which largely correspond to the rainforests of the Amazon, Mid-Western Africa and Indonesia, Malaysia, and Papua New Guinea.

These rainy regions are easily recognized in the middle animation above, which show how the precipitation changes over the year. As expected the rainiest regions are where we find rainforests near the equator, as well as along the coast of British Columbia and northern Washington in the US. A few rainy islands in the Pacific and South Atlantic are shown with disproportionately large areas due to the lack of any other weather stations nearby (see the description of the map below).

The temperature map is also as expected, wherein the temperatures follow the seasons. Most striking, perhaps, is how much the temperatures change over the large landmasses of North America and Siberia, as compared to the oceanic regions. The astutue eye may also notice persistently colder temperatures over Tibet, Mongolia and Central China due to their high elevation.

These animations were created by recording interactions with the map described below.

The Map

The map below contains a Voronoi diagram overlay where each cell is color coded according to the climate data for the location defining that point (default is sunshine). Moving the mouse over any cell will show the city it corresponds to as well as its climate data.

A time range can be selected using the circular control on the bottom right corner (only works on the desktop version). The letters refer to the months of the year. Dragging one of the handles will extend or contract the range, whereas dragging on the range itself will translate it.

Different climate data overlays can be selected via the icon in the upper right corner.



How It's Made

Data Preparation

  1. Wikipedia dumps for all the pages are downloaded.
  2. For each article that has an associated location and weatherbox, I extract the name, latitude, longitude and weatherbox data and store it in a JSON file.
  3. This file is filtered for for any entries that don’t have sun, precipitation, high and low temperatures
  4. The final file is used as input to climate-map.js
  5. All of the code for parsing wikipedia is on github

Interactive Map

  1. Country outlines were obtained from Johan Sundström’s world.geo.json github repository
  2. The circular brush was obtained from Elijah Meeks’ bl.ock
  3. The map itself is displayed using leaflet.js.
  4. There’s a bottom layer using CartoDB’s Positron Layer, although this is usually covered by up the SVG containing the voronoi diagram.
  5. There’s a middle layer containing the SVG element with all of the voronoi cells.
  6. There’s a label-only CartoDB Position Layer.
  7. Finally, on top of that, the circular brush is used to create the month selector control on the bottom right corner.
  8. The layer selector control on the upper right hand corner is a hacked facsimile of Leaflet.js’s Layers Control. It’s hacked because the different layers aren’t actually Leaflet layers, but rather different cross sections of the data. Selecting different options triggers a different data bind for the Voronoi cells in the SVG layer.

Related

A similar map is available for historical temperatures at halftone.co.


Multi-Page Vertically Centered Latex Table

  • 04 Aug 2015
  • |
  • latex
  • |

Say we wanted to create a latex table that had vertically centered text. Say furthermore that our table was very long and we wanted it to automatically span multiple pages.

Then we might imagine that the table would have a heading…



Followed by an ending at the end of the page…



Which would be continued at the start of the next page…



Such a table can be created using the snippet of code below. It uses the array package for vertically centering the cell text and the longtable package for automatically breaking up the table across multiple pages.

\usepackage{array}
\usepackage{longtable}

\begin{longtable}{ >{\centering\arraybackslash} m{1cm} >{\centering\arraybackslash} m{4cm} >{\centering\arraybackslash} m{4cm}}
PDB ID & Length & Coarse Grain Structure \\
\hline
\endfirsthead

\multicolumn{3}{c}%
{ {\bfseries \tablename\ \thetable{} -- continued from previous page} } \\
\hline PDB ID & Length & Coarse Grain Structure \\
\hline 
\endhead

\hline \multicolumn{3}{|r|} \\ \hline
\endfoot

\hline \hline
\endlastfoot
1GID & 158 & \includegraphics[width=4cm]{gfx/cgs/1GID_A.png} \\
3D0U & 192 & \includegraphics[width=4cm]{gfx/cgs/3D0U_A.png} \\
4GXY & 192 & \includegraphics[width=4cm]{gfx/cgs/4GXY_A.png} \\
\end{longtable}

Stack Exchange References:

  1. How to vertically-center the text of the cells?
  2. Make a table span multiple pages

Randomly Finding Someone on a Grid

  • 03 Aug 2015
  • |
  • javascript
  • d3.js
  • problem
  • |

A few months ago somebody asked the following question on AskReddit:


“If I wanted to randomly find someone in an amusement park, would my odds of finding them be greater if I stood still or roamed around?”


As usual, the comments were highly informative with various users running simulations to show that two people would generally find each other faster when both moved around. The question was lost in the detritus of my mind until a recent late-night conversation after a bioinformatics conference. The context in this conversation was different but the underlying question was the same.


“Are two proteins more likely to encounter each other when both are mobile or when one is attached to a cell membrance while the other is free to float around the interior of the cell?”


After bringing up the reddit post, I tried my best to explain why it would make sense that fixing the position of one person (or protein) would make it easier for the other to find it.

Alas, my memory had failed me and this conclusion was contrary to that suggested by the reddit comment simulations. What struck me, however, is that I couldn’t make a coherent case for why standing still is better, and wasn’t convinced by the other’s argument that moving around was better. Frustrated, I decided to make a little tool to perform a simulation of my own and to illustrate the phenomenon. So, without further ado, here we go.

Scenario 1: One person waits

One person (green circle) waits, while the other (red circle) searches. Both start at random positions at the start of each simulation.

Scenario 2: Both people move

Both people (green and red circles) move around. Both start at random positions at the start of each simulation.

Verdict

To save you some time, I took the liberty of repeating these simulations for a variety of grid sizes and recorded the results.


Number of Moves Required to Meet

Grid Size Both People One Person
Moving Waits
2x2 2 2
4x4 11 14
8x8 53 73
16x16 257 453

It’s clear that the better strategy is for both people to move at random than to have one person wait while the other moves randomly.

But wait…

Is this really a realistic scenario? How often do people just move around randomly? In most cases, when people are looking for something, they search in some methodic manner. The rest of this post will explore two fundamentally similar strategies and how they affect the search time.

Scanning Strategy

The first strategy involves simply scanning the grid up and down, left to right.

Avoiding Strategy

The avoiding strategy involves keeping track of where the person has been, and at every point, visiting the least visited neighbor. If there is more than one least visited neighbor, pick one at random and continue.

Centering Strategy

A reddit user recently pointed out that a reasonable strategy might be for one person to just go to the center because there they would have a larger chance of being found by randomly moving person. An illustration of this appears below where the red person moves to the center and green person wanders around randomly.

Pairwise Comparison of Strategies

To determine what the best course of action is when two people are separated somewhere, we need to do a pairwise comparison of the strategies on an 8x8 grid:


Median Moves Requires to Meet

Standing Random Scanning Avoiding Centering
Standing * 82 34 33 *
Random 60 51 51 54
Scanning * 52 15
Avoiding 57 30
Centering 3


Generally, standing still is the best strategy if the other person is not moving around randomly. Even if they are, the best strategy appears to be to first go the center and stay put there. Moving around randomly is only optimal if the other person is doing an avoiding walk. Scanning and avoiding work well in general with one major caveat: if both people are scanning, it can lead to a situation where the two peole never meet, no matter how long they walk.

The avoiding strategy avoids (heh), this outcome by introducing a little bit of stochasticity into the searching process by picking a random position to go to when there are equally many least visited options available.

Conclusion

The best strategy is for people to choose a point at which they meet if they get lost. When both people go to the center, it takes a median of 3 moves and will never take more than four moves (on an 8x8 grid).

Often times, however, this doesn’t enter the conversation and an ad-hoc strategy must be selected. The second best option is for one person to stop moving and for the other to scan the search area looking for them. This also requires a certain amount of prior coordination to determine who will stop. If both people decide to stop, then it stands to reason that they’ll never find each other. Thus, when there is no agreed upon strategy, it appears that the best choice is to walk around avoiding places that you’ve already been and taking a random turn here or there.

Finally, if you’re really lazy it pays off to walk to the center and wait there. This will actually lead to a faster rendezvous when the other person is walking around randomly than walking around randomly yourself.

Good luck!

Read on for a look at the distributions of finding times and the application that made all of this possible.

Appendix


The Distribution of the Number of Moves Required to Meet

Standing Random Scanning Avoiding Centering
Standing
Random
Scanning
Avoiding
Centering

Key Points from the Histograms:

  • Having one person sit still severly restricts the maximum number of moves that need to be taken if the other person is using the avoiding or scanning strategy.
  • One person standing and the other moving randomly can lead to an exorbitantly long search time.
  • Both people standing or both people scanning can lead to infinite search times.
  • Most of the time, the search will be faster rather than slower, but the times can vary a lot. Much of the bias toward faster search times is actually due to the fact that the starting points are chosen randomly, and closer starting points are more likely than further starting points.

In Three Dimensions

A few people wondered how different the results would be on a 3D grid. It’s large-ish step to add 3D rendering to the current animations, but changing the code to use 3D dimensions wasn’t a huge deal. The results are in the table below.


Median Moves Requires to Meet

Standing Random Avoiding
Standing * 438 253
Random 380 375
Avoiding 344


The results a roughly analagous to those on the 2D grid. Naturally, it takes more moves for people to find each other in 3D, but the best strategy remains standing - avoiding, and moving in a random fashion is better than standing when the other person is also moving randomly. For those that are interested, the histograms below show the distribution of search lengths on a 3D grid.


The Distribution of the Number of Moves Required to Meet on a 3D Grid

Standing Random Avoiding
Standing
Random
Avoiding


Make Your Own Simulation

Here’s an application that you can use to run your own simulation.

Source

The source code for the application to run these simulations can be found on github.