6  Resource Curse Theory: Natural Resources and Democracy

In this chapter, we continue with analyzing the World in 2010 dataset. We have already downloaded it and placed it in our respective data folder. In case you need to download it again, it is available on Blackboard. It can also be downloaded from here. A codebook is also available on Blackboard.

Let’s get ready by clearing the environment, calling for the R packages we are going to use, and loading the dataset.

# Preliminaries #### 

# clear the environment
rm(list = ls())

# call the packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# load the data
df <- read.csv("data/world_in_2010.csv")

Recall the resource curse theory which proposes that natural resources are often a ‘curse’, obstructing democracy, impeding political and economic development, and leading to internal armed conflict. We are going to analyze data to see if there is a relationship between natural resources and such adverse outcomes.

We are going to start by exploring the relationship between natural resources and democracy.

\[ Natural \; resources \rightarrow No \; democracy \]

The first two variables we are interested in are as follows:

  1. Democracy score: v2x_polyarchy
  2. Natural resources as the share of the GDP: Natural_resources_rents_perc_of_GDP

The first variable, v2x_polyarchy, is the V-Dem’s Electoral Democracy Index, which measures the level of electoral democracy in a country in a given year (2010 in this case). It is a value between 0 and 1, where the higher the index score, the more democratic a country is. This is our outcome variable.

The second variable, Natural_resources_rents_perc_of_GDP, is the explanatory variable, measuring the share of contributions by natural resources to the GDP.

Our first goal is to create a cross tabulation where we can see the proportion of democracies in major natural resource producing countries. We will compare this figure to the proportion of democracies in countries that are not major natural resource producers.

Table 6.1 presents an example cross tabulation.

Table 6.1: Democracy and Natural Resources
Autocracy Democracy Total
Major Resource Producer 55 (79.71%) 14 (20.29%) 69 (100%)
Not Major Producer 37 (39.78%) 56 (60.22%) 93 (100%)
Total 92 (56.79%) 70 (43.21%) 162 (100%)

Row percentages in parentheses sum up to 100%

We should start by exploring our variables of interest one by one. We are going to carry out a series of univariate analyses.

6.1 Measuring democracy: v2x_polyarchy

Let’s explore the data with a quick summary().

# summary of the polyarchy variable 
summary(df$v2x_polyarchy)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0200  0.3400  0.5600  0.5468  0.7800  0.9300 

In line with the description in the codebook, the values of v2x_polyarchy are between 0 and 1 (more precisely, the minimum is 0.02 and the maximum is 0.93). There are no missing values. It is apparent that v2x_polyarchy is a continuous numerical variable.

We can create a histogram to visually summarize the variable.

# Histogram of polyarchy
hist(df$v2x_polyarchy,
     breaks = seq(0, 1, 0.1),
     main = "Histogram of V-Dem Polyarchy Index", 
     xlab = "Level of democracy (Polyarchy Index)", 
     ylab = "Frequency"
     )

We can also create a boxplot.

# A box plot
boxplot(df$v2x_polyarchy, 
        main = "Distribution of V-Dem Polyarchy Index", 
        ylab = "Level of democracy (Polyarchy Index)",
        ylim = c(0,1) # y-axis should span from 0 to 1
        )

We should also quickly look at the most and the least democratic countries. We have already learned how to do this in base R, but let’s see another way using tidyverse.

# Least Democratic (most authoritarian) 10 countries
df |> 
  select(Country_Name, v2x_polyarchy) |> # selects only two variables
  arrange(v2x_polyarchy) |> # arranges (sorts) from lowest to highest
  head(n = 10) # select only the first 10 rows
   Country_Name v2x_polyarchy
1    Saudi Arab          0.02
2       Eritrea          0.07
3         Libya          0.08
4         Qatar          0.09
5         China          0.10
6       Lao PDR          0.10
7    Korea, Dem          0.10
8     Swaziland          0.13
9          Fiji          0.15
10      Myanmar          0.15

For the most democratic countries, we need to add another tidyverse function ‘desc()’ meaning descending.

# Most Democratic 10 countries
df |> 
  select(Country_Name, v2x_polyarchy) |> # selects only two variables
  arrange(desc(v2x_polyarchy)) |> # arranges (sorts) from highest to lowest (descending)
  head(n = 10) # select only the first 10 rows
   Country_Name v2x_polyarchy
1    United Kin          0.93
2        Sweden          0.93
3       Uruguay          0.93
4     Australia          0.92
5    Costa Rica          0.92
6        France          0.92
7    United Sta          0.92
8        Norway          0.91
9    Switzerlan          0.90
10      Germany          0.90

6.2 Creating a categorical democracy variable from the numerical v2x_polyarchy variable

As we emphasized, v2x_polyarchy is a numerical variable. However, we would also like to conceptualize democracy as a binary variable: a country is either a democracy or not. In this regard, we want to create a categorical (binary) variable from a numerical variable (v2x_polyarchy). Only by doing this, we can calculate the rate of democracies. This will allow us to create a table similar to Table 6.1.

Let’s decide on a cut-point where any v2x_polyarchy score below this cut-point is considered as an autocracy and anything above is categorized as a democracy. A common cut-point used in the literature is 0.60.

We need to create a new variable using this condition:

  • If a country’s v2x_polyarchy index is higher than 0.60, then it is a democracy.
  • If a country’s v2x_polyarchy index is 0.60 or lower, then it is an autocracy.

Let’s call this new variable regime_type.

# Creating the regime type variable
# start with an empty vector (of NAs):
df$regime_type <- NA

# fill the empty variable regime_type using the appropriate conditions:
df$regime_type[df$v2x_polyarchy > 0.60 ] <- "Democracy" # v2x_polyarchy is higher than 0.60 
df$regime_type[df$v2x_polyarchy <= 0.60 ] <- "Autocracy" # v2x_polyarchy is 0.60 or lower

We created the new variable regime_type, but we should also do some checks for quality control. It is easy to make a mistake while creating a variable. We want to be sure that the new variable is error free.

# Some quality controls 

# start with a table:
table(df$regime_type, useNA = "always")

Autocracy Democracy      <NA> 
       95        71         0 

There are 95 autocracies and 71 democracies without any missing data. This looks good. Next, let’s make sure that our cut-point is working well.

No country with a v2x_polyarchy index 0.60 or below should be categorized as a democracy. All countries with v2x_polyarchy a score above 0.60 should be democracies. Let’s quickly check:

# make sure to put v2x_polyarchy first, so it goes to rows.
# regime_type should go to columns (two categories)
# this way it is easier to see
table(df$v2x_polyarchy, df$regime_type)
      
       Autocracy Democracy
  0.02         1         0
  0.07         1         0
  0.08         1         0
  0.09         1         0
  0.1          3         0
  0.13         1         0
  0.15         3         0
  0.16         1         0
  0.17         3         0
  0.19         1         0
  0.2          2         0
  0.21         1         0
  0.22         3         0
  0.23         1         0
  0.24         4         0
  0.25         2         0
  0.27         1         0
  0.28         1         0
  0.29         1         0
  0.3          2         0
  0.31         2         0
  0.32         2         0
  0.33         1         0
  0.34         5         0
  0.35         4         0
  0.36         2         0
  0.37         2         0
  0.38         2         0
  0.42         1         0
  0.43         4         0
  0.44         2         0
  0.45         1         0
  0.46         2         0
  0.47         3         0
  0.48         3         0
  0.49         2         0
  0.5          2         0
  0.51         3         0
  0.52         2         0
  0.53         2         0
  0.55         1         0
  0.56         2         0
  0.57         4         0
  0.58         4         0
  0.59         1         0
  0.6          2         0
  0.62         0         2
  0.63         0         3
  0.64         0         1
  0.65         0         1
  0.66         0         2
  0.67         0         2
  0.68         0         5
  0.69         0         1
  0.7          0         4
  0.71         0         2
  0.73         0         1
  0.75         0         1
  0.76         0         3
  0.78         0         3
  0.79         0         3
  0.81         0         1
  0.82         0         1
  0.84         0         1
  0.85         0         3
  0.86         0         4
  0.87         0         2
  0.88         0         6
  0.89         0         7
  0.9          0         4
  0.91         0         1
  0.92         0         4
  0.93         0         3
Quality checks are for you

Data analysis usually involves many steps. We do not report preparatory steps.

Do not report such steps for quality checks. The table above is informative for us (we are sure that we did it correctly!), but they are not meaningful for the reader!

It looks like we did it!

If you studied the codebook, you might have realized that there is already a binary democracy variable available. In other words, we did not need to create this new variable, but we did it for the purposes of practice. We can also verify our approach.

# a binary democracy variable was already available in the dataset: 
summary(df$democracy)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.4277  1.0000  1.0000 
# a table:
table(df$democracy)

 0  1 
95 71 
# verify that we did it correctly:breaks
table(df$democracy, df$regime_type)
   
    Autocracy Democracy
  0        95         0
  1         0        71

We verified the variable we created!

Let’s visually summarize it using a bar plot.

barplot(table(df$regime_type), 
        ylab = "Number of Countries" 
        )
Figure 6.1: The number of countries by regime type

6.2.1 Moving from raw frequencies to percentages

In Figure 6.1, we have visualized the number of countries, which refers to raw frequencies. In some instances, we may want to present percentages instead. We can quickly calculate and present them.

All we need to do is to use table() function in conjunction with squared-parenthesis notation.

# raw frequencies:
table(df$regime_type)

Autocracy Democracy 
       95        71 
# the number of democracies
table(df$regime_type)[2]
Democracy 
       71 
# the number of all countries
sum(table(df$regime_type))
[1] 166

The number of democracies divided by the number of all countries will give use the proportion of democracies.

table(df$regime_type)[2] / sum(table(df$regime_type))
Democracy 
0.4277108 

This is the rate of democracies, but there are too many decimal points. Just two decimal points would be enough. Also, let’s present it in terms of percentages. Multiplying this figure by a 100 and rounding it to decimal points would us what we want: percentage points.

# democracy rate:
dem_rate <- table(df$regime_type)[2] / sum(table(df$regime_type))
dem_perc <- dem_rate * 100
dem_perc <- round(dem_perc, 2)

# in percentages:
dem_perc
Democracy 
    42.77 

We can do the same for autocracies.

# autocracy rate:
aut_rate <- table(df$regime_type)[1] / sum(table(df$regime_type))
aut_perc <- aut_rate * 100
aut_perc <- round(aut_perc, 2)

# in percentages:
aut_perc
Autocracy 
    57.23 

Let’s put them together in an object

regime_perc <- c(aut_perc, dem_perc)
regime_perc 
Autocracy Democracy 
    57.23     42.77 

Let’s do a quality check: these two numbers should add up roughly to 100.

sum(regime_perc)
[1] 100

Now, we can do a barplot using percentages.

barplot(regime_perc)

Let’s try to make it prettier.

barplot(regime_perc,
        ylim = c(0, 60), # y-axis limits (coverage is from 0 to 60 )
        ylab = "Percentage of Countries",
        main = "Distribution of Regime Type in 2010"
        )
Figure 6.2: Percentage of autocracies and democracies across the world

Figure 6.2 looks nice. We can present such a nice graph in our reports.

6.3 Explanatory variable: natural resource production

Next, let’s move to our explanatory variable: natural resource production (Natural_resources_rents_perc_of_GDP). This variable gives us the share of natural resources in a country’s economic output. Again, I will start with a quick summary.

summary(df$Natural_resources_rents_perc_of_GDP)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.015   3.815   9.043  12.000  53.660       4 

This variable measures the share in terms of percentage, which means it should be between 0 and 100. As we expected, all the numbers are between 0 and 100. the maximum is 52.660 and the minimum is 0. There are no weird figures such as below 0 or above 100. However, we should be careful. There are 4 missing values.

Let’s check the countries with the missing data.

# Give me the country name such that natural resource variable is not available
df$Country_Name[ is.na(df$Natural_resources_rents_perc_of_GDP)]
[1] "Korea, Dem" "Somalia"    "Syrian Ara" "Timor-Lest"

What about the top 10 countries that are most dependent on natural resources?

# find the countries that are most dependent on natural resources (top 10)
df |> 
  select(Country_Name, Natural_resources_rents_perc_of_GDP) |> # selects only two variables
  arrange(desc(Natural_resources_rents_perc_of_GDP)) |> # arranges (sorts) from highest to lowest
  head(n = 10) # select only the first 10 rows
   Country_Name Natural_resources_rents_perc_of_GDP
1         Libya                               53.66
2    Congo, Rep                               49.14
3    Mauritania                               48.70
4        Kuwait                               48.30
5          Iraq                               41.48
6    Saudi Arab                               40.93
7        Angola                               39.04
8      Mongolia                               38.66
9          Oman                               37.14
10   Equatorial                               34.71

I am not going to check the lowest because there are many countries that do not produce any natural resources. This part is not interesting.

Let’s visually summarize the variable. Starting with a histogram.

hist(df$Natural_resources_rents_perc_of_GDP,
     main = "Natural resources as the % of GDP", 
     xlab = "Percentage of GDP"
     )

We can also produce a box plot.

boxplot(df$Natural_resources_rents_perc_of_GDP,
     main = "Natural resources as the % of GDP", 
     ylab = "Percentage of GDP"
     )

Again, Natural_resources_rents_perc_of_GDP is a numerical variable, but we can create a categorical variable using it. This will allow us to do a cross tabulation.

6.4 Whether a country is a major natural resource producer or not

We need to decide a cut-point to categorize countries according to whether they are a major natural resource producer or not. I suggest 5% because it is a nice round number. Looking at the data, it is also higher than the median.

We have two categories:

  • A country is considered a major natural resource producer if natural resource production accounts for 5% or more of its GDP.
  • A country is not considered a major natural resource producer if natural resource production accounts for less than 5% of its GDP.
# There is no variable categorizing a country as a major natural resource producer or not
# We need to create this
# let's decide a threshold: 5 percent or over of the GDP"
df$nat_res <- NA
df$nat_res[df$Natural_resources_rents_perc_of_GDP >= 5] <- "Major Producer"
df$nat_res[df$Natural_resources_rents_perc_of_GDP < 5]  <- "Not Major Producer"

Let’s see a frequency table.

# table of the new variable (also show missing)
table(df$nat_res, useNA = "always")

    Major Producer Not Major Producer               <NA> 
                69                 93                  4 

Let’s do a quality control check if you wish.

# check if categorization is correct:
# are countries below %5 coded not major producer?
# are countries above %5 coded major producer?
table(df$Natural_resources_rents_perc_of_GDP, df$nat_res)

6.5 Cross tabulate regime_type and nat_res

We reached to the stage that we can cross-tabulate to assess the relationship between natural resources and regime type. We will get closer to producing Table 6.1.

Important: The order is important

The order is important for reporting which percentages (row or column) are displayed. Traditionally, horizontal axis (x-axis) is reserved for the independent variable whereas the vertical axis (y-axis) is for the dependent variable. Horizontal property refers to rows in a cross tabulation. The dependent variable should go to columns.

# cross tabulate:
# first variable is for rows: explanatory variable
# second variable is for column: outcome variable
table(df$nat_res, df$regime_type)
                    
                     Autocracy Democracy
  Major Producer            55        14
  Not Major Producer        37        56

We have produced the raw frequencies! Next, we need to calculate respective percentages.

We have two questions:

  1. What is the proportion of democracies among the countries that are major natural resource producers?
  2. What is the proportion of democracies among the countries that are not major natural resource producers?

If the proportion of democracy is much lower among the countries that are major natural resource producers than the countries that are not major resource producers, then there is a relationship in the direction of what the resource curse theory expects.

There are \(55 + 14 = 69\) major resource producers. Only \(14 / 69 = 0.20\) of them are democracies whereas \(0.80\) of them are autocracies.

Let’s ask R to calculate it for us.

# Major resource producers are the first row
table(df$nat_res, df$regime_type)[1, ]
Autocracy Democracy 
       55        14 
# sum of the first row is the number of natural resource producers
sum(table(df$nat_res, df$regime_type)[1, ])
[1] 69
# respective proportions of regime type for natural resource producers
table(df$nat_res, df$regime_type)[1, ] / sum(table(df$nat_res, df$regime_type)[1, ])
Autocracy Democracy 
0.7971014 0.2028986 
# Let's keep this in an object:
natres_yes <- table(df$nat_res, df$regime_type)[1, ] / sum(table(df$nat_res, df$regime_type)[1, ])

# multiply it with 100 to make it percentages
natres_yes <- natres_yes * 100

# round it to two decimal points
natres_yes <- round(natres_yes, 2)

# see the output
natres_yes
Autocracy Democracy 
    79.71     20.29 
# sum of the two should be ~100
sum(natres_yes)
[1] 100

Next, we will do the same for countries that are not major natural resource producers.

# Not major resource producers are the second row
table(df$nat_res, df$regime_type)[2, ]
Autocracy Democracy 
       37        56 
# sum of the second row is the number of countries that are not major resource producers
sum(table(df$nat_res, df$regime_type)[2, ])
[1] 93
# respective proportions of regime type for not natural resource producers
table(df$nat_res, df$regime_type)[2, ] / sum(table(df$nat_res, df$regime_type)[2, ])
Autocracy Democracy 
0.3978495 0.6021505 
# Let's keep this in an object:
natres_no <- table(df$nat_res, df$regime_type)[2, ] / sum(table(df$nat_res, df$regime_type)[2, ])

# multiply it with 100 to make it percentages
natres_no <- natres_no * 100

# round it to two decimal points
natres_no <- round(natres_no, 2)

# see the output
natres_no
Autocracy Democracy 
    39.78     60.22 
# sum of the two should be ~100
sum(natres_no)
[1] 100
# Put the two together:
tbl_percentages <- rbind(natres_yes, natres_no)
tbl_percentages
           Autocracy Democracy
natres_yes     79.71     20.29
natres_no      39.78     60.22

Now we have everything to produce the table we want to produce.

# frequencies:
table(df$nat_res, df$regime_type)
                    
                     Autocracy Democracy
  Major Producer            55        14
  Not Major Producer        37        56
# percentages
rbind(natres_yes, natres_no)
           Autocracy Democracy
natres_yes     79.71     20.29
natres_no      39.78     60.22
Table 6.2: Democracy and Natural Resources
Autocracy Democracy Total
Major Resource Producer 55 (79.71%) 14 (20.29%) 69 (100%)
Not Major Producer 37 (39.78%) 56 (60.22%) 93 (100%)
Total 92 (56.79%) 70 (43.21%) 162 (100%)

Row percentages in parentheses sum up to 100%

The rate of democracy is much lower (20.29%) among major natural resource producers compared to countries that are not major resource producers (60.22% of them are democracies).

6.5.1 A shortcut for calculating percentages

We calculated the respective percentages through bit by bit. This is a good way of learning. There are also shortcuts to carry out the same calculations. For example, prop.table() function is useful to calculate proportions in a table.

Let’s start playing with prop.table().

# table for natural resources and regime type
tbl_nr <- table(df$nat_res, df$regime_type)

# see the table
tbl_nr
                    
                     Autocracy Democracy
  Major Producer            55        14
  Not Major Producer        37        56
# prop.table() gives the cell proportions (when the margin is left undefined)
prop.table(tbl_nr)
                    
                      Autocracy  Democracy
  Major Producer     0.33950617 0.08641975
  Not Major Producer 0.22839506 0.34567901
# summation of these proportions 

We have produced cell proportions, but this is not we want to report. We want to calculate row proportions (turn these into percentages and report them). We will use the margin option.

# prop.table() gives the cell proportions (when the margin is left undefined)
prop.table(tbl_nr, margin = 1) # margin 1 refers to rows 
                    
                     Autocracy Democracy
  Major Producer     0.7971014 0.2028986
  Not Major Producer 0.3978495 0.6021505
# Percentages
prop.table(tbl_nr, margin = 1) * 100 
                    
                     Autocracy Democracy
  Major Producer      79.71014  20.28986
  Not Major Producer  39.78495  60.21505
# round them
round(prop.table(tbl_nr, margin = 1) * 100, 2)
                    
                     Autocracy Democracy
  Major Producer         79.71     20.29
  Not Major Producer     39.78     60.22
# keep them in an object
tbl_nr_prc <- round(prop.table(tbl_nr, margin = 1) * 100, 2)

6.5.2 Visualization

Creating a stacked bar graph is a good way of visualizing the correlation between regime type and natural resources.

Figure 6.3: The number of countries by regime type

Let’s break this apart one by one. First, the category on the x-axis is whether a country is a major natural resource producer or not. Second, the category shown by the stacked bars (color shading) refers to regime type. The two top bar chunks (in sky blue) represents democracy and the two lower bar chunks (in salmon red) are for autocracy.

Let’s try to generate this. What if we plug in a barplot() to our table?

# barplot function directly takes the table tbl_nr_prc as input:
barplot(tbl_nr_prc)

Ok, this is a start, but we are a little bit off. Most notably, x-axis categories refer to regime type. This needs to be swapped. Also, bars go higher than 100%, which is weird. This is because barplot() is stacking column wise, but our input is row wise. To swap it, we can simply use a transpose function: t(). This will swap the rows and columns.

# swapping rows and columns: let's see if it is working
t(tbl_nr_prc)
           
            Major Producer Not Major Producer
  Autocracy          79.71              39.78
  Democracy          20.29              60.22
# yes it is working!
# feed this into barplot
barplot(t(tbl_nr_prc))

We have created the substantive part. Next, add the legend.

barplot(t(tbl_nr_prc),
        legend.text = colnames(tbl_nr_prc)
        )

Finally, we can add titles.

barplot(t(tbl_nr_prc),
        main = "Regime Type and Natural Resources",
        ylab = "Percentage of Countries",
        legend.text = colnames(tbl_nr_prc)
        )

You could also change the colors if you like. We have already covered how to do this.

When reporting, do not forget to put figure captions and numbers (see Figure 6.3).

How to write it out?

# writing this graph into a .png file
png("stacked.png", res = 300, units = "cm", width = 12, height = 16)
barplot(t(tbl_nr_prc),
        main = "Regime Type and Natural Resources",
        ylab = "Percentage of Countries",
        legend.text = colnames(tbl_nr_prc)
        )
dev.off()

6.6 Task: Natural Resources and Civil Conflict

This is a challenge for you: assess the relationship between natural resources and civil conflict. The resource curse theory also posits that countries that rely heavily on natural resource production are vulnerable to internal armed conflict. \[ Natural \; resources \rightarrow Civil \; Conflict \]

Take a look at the CivilConflict variable. It has three categories: civil war; minor civil conflict; no conflict. Using CivilConflict and nat_res, answer the questions below.

  1. Is there a relationship between natural resources and civil conflict?

    • Cross tabulate CivilConflict and nat_res.
    • For major natural resource producers, report the percentage of countries that are peaceful, in civil war, or facing minor conflict.
    • For countries that are not major natural resource producers, report the percentages for respective conflict categories.
    • Comparing these percentages, make a decision: is there a relationship between natural resources and civil conflict?
  2. Using CivilConflict variable, create a new binary variable called conflict_bin, which indicates whether a country is peaceful or in conflict. Repeat the question #1 for this conflict_bin variable.

  3. Create a stacked bar graph similar to Figure 6.4.

Figure 6.4: Percentage of countries in conflict by natural resource production