A Tale of Two Economies (Critique)

Introduction to Infographics and Data Visualization (Nov 16 – Dec 13, 2015)

Week 1 

We are required to critique one of the four infographics shared, after going through some readings.

– A Tale of Two Economies


– Asia’s Mobile and Internet Speeds


– Income Inequality


– Recent significant disasters in the Asia and Pacific regions



I was planning to go after Internet Speeds but then “Tale of Two Economies” seemed far more interesting.

Here is my first critique:

1. Is the graphic successful at communicating its main points?
>> Yes it does, covering population, land mass, economy, and other indiacators
2. Did the designer use the right data? Has the designer tried to simplify too much, to the point of forgetting the context  of the data? Does the graphic need to include other variables, besides the ones the designer chose, to make the story clearer?
>> The designer has made good use of data, covering sufficient ground to present a comparison of the two.
3. Are the graphic forms chosen to represent the data the most appropriate? Do they let you extract meaning without having to read every single number? Can you make comparisons, see relationships, etc.? Can you see patterns and trends thanks to
the graphic?
>> This is where creativity has taken over, making the Visual Maths hard.

1. POPULATION: I would have preferred using the bar charts as done for Energy (may be using people icon to depict population)

2. ECONOMY: Good use of space. Desinger has tried to keep the orientation such that minimum tilt of head is required. But I would have preferred simpler charts. Similarly creativity regarding GDP Per Capita, and Per Capita Disposal Income could be
moderated a bit

3. BANG FOR YOUR BUCK: A very nice indicator to compare, however the two red and white bubbles at the bottom leave my confused.
4. Does the graphic need better copy (headlines, explainers, etc.) to improve understanding? Remember that an infographic
is not just visuals, but a combination of words and graphics.
>> It does a good job
5. Try to redesign the graphic based on your thoughts about it. A rough, hand-drawn sketch will suffice, if you don’t know
how to use any software tool.
>> Attached, I have not completed it, but simply put I would be taking all the creativity out, and replace it with orthodox bar charts.


Wk 1 Assignment
Wk 1 Assignment

internet-use-rate-vs-urban-rate (Prologue)

Just found the following on the internet:


Nadieh Bremer studies the impacts of Urbanization, in her own words:

My winning entry to the “Visualizing Urban Expansion in East Asia” challenge organized by the World Bank and Visualizing.org. The interactive version takes you through a narrative about the impacts of Urbanization in almost 900 cities in East Asia between 2000 and 2010.

internet-use-rate-vs-urban-rate (Part 4)

 Data Management and Visualization – Coursera Course

Assignment 4

The embedded pdf includes the code along with output.

(Coursera-Data Visualization-Assignment 4)

Its time to get cracking …

The initial part where the text file “gap minder” is read in and analyzed with “describe” is self-explanatory. Hence Mean and Spread (in terms of Standard Deviation) can be read directly

Internet - Describe

Similarly for the other variable:

Urban Ratet - Describe

As the variables of interest “Internet Rate” and “Urban Rate” are both QUANTITATIVE, we analyze them using Histograms. Various options such as number of bins are varied to study the data distribution.


And so on for the other variable


In terms of finding the bi-variate relationship a scatter plot is used.


Instead of using Urban Rate as the independent variable, I have used Internet Rate. The relationship seems to hold this way as well, as the Internet Rate increases, so does the UrbanRate. It may not simply be due the fact that people move out to bigger cities, it might very well be that smaller cities “grow” or get “modernized” in different aspects, including getting good internet.

Internet Use Rate Vs Urban Rate

Data Management and Visualization – Coursera Course

Assignment 3

The embedded pdf includes the code along with output.

(Coursera-Data Visualization-Assignment 3)

Let us now discuss it bit by bit ….

The assignment though asks to include 3 variables to pursue the analysis, but I restricted to 2 variables. The data set of GapMinder shared on the course website includes only “Quantitative” Variables, and no “Categorical Variables”. I thought of adding details such as “Continent” as a categorical variable but could not do so due to shortage of time. Similarly I could not find any established bin ranges for “Internet Usage” or “Urban Rate” – the variables I was exploring, as part of the research. Therefore I simply experimented with various “number of bins” to explore the distribution of these variables.

  1. First – let us look into “Internet Usage

I am ignoring to explain the details of reading data from a CSV file and importing the required libraries. We start by printing the dimensions of the data set i.e. count of rows and columns


Next we set the variables of interest to numeric, and print a descriptive summary.

As stated in the assignment, “Data management includes such things as coding out missing data, coding in valid data, recoding variables, creating secondary variables and binning or grouping variables.Not everyone does all of these, but some is required.”

Let us start with Null Values, it does not make sense to impute the data (fill in the missing values with substituted values such as average of the valid values), therefore all null values are removed.


Frequency or number of data occurrences within each bin is printed next. Furthermore, bin # is printed against each of the data value, i.e. if first value lies in 3rd bin, 3 is printed at first index.


Next a histogram with 5 bins is printed, overlaid with a dashed line indicating “average” in pink color


(Please ignore the title of the graph where it says Histogram with 4 bins)

Next a histrogram with 10 bins is printed.Clearly the skew on the right becomes evident.


Next a histrogram in outline is printed.


After experimenting with 20 bins, a histogram with cumulative probability on the y-axis is is plotted


Second – let us look into “Urban Rate

In terms of analysis, similar steps are followed as for the first variable. Null Values are removed and then different bins are tried to examine the distribution of the variable


The distribution of the data is more uniform … let us dig more by increasing the bins


The hunch was right, the data is spread “almost” uniformly across the range.

Instead of recoding the variables or creating secondary variables, I preferred to experiment with bin counts, as these provide more visibility and there are no pre-defined ranges which work for “Internet Rate” or “Urban Rate”