May 26, 2021

Premier Provenance of Pollution

This is utterly embarrassing for the Philippines. No excuse period.

Source: OWID









The world is going to be filled with more older people

 


Stata codes are:

#delimit ;

twoway
(line below5 year, lwidth(vthick) lcolor(red))
(line above65 year, lwidth(vthick) lcolor(blue))
,
subtitle("{bf:Young Children and Older People as a Percentage}" "{bf:of Global Population: 1950-2050}", color(white))
ylab(0 "0%" 5 "5%" 10 "10%" 15 "15%" 20 "20%", grid gmin gmax angle(0) labcolor(white) noticks)
xlab(1950(10)2050, labcolor(white) noticks)
xtitle("")
xsize(6)
legend(off)
graphregion(fcolor(ebblue))
plotregion(fcolor(white))
text(15.5 1965 "{bf:Age <5}")
text(4 1965 "{bf:Age 65+}")
note(" " "Source: United Nations. {it:World Population Prospects 2019}." "Available at: https://population.un.org/wpp/", size(small) color(white))
;

#delimit cr


Religiosity and Poverty

One of the most ironic things about religious societies is how everyone tries to spend substantial resources to celebrate religious festivals. Particularly for low income families, they would even go as far as borrow money and essentially decrease future flows of disposable income. Maybe they derive some non-economic benefits from it much similar to how people in the other side of the spectrum (a.k.a. rich people) get into philanthropy.

These thoughts are exactly what I found in the latest NBER paper by Eduardo Montero and Dean Yang. Entitled "Religious Festivals and Economic Development: Evidence from Catholic Saint Day Festivals in Mexico", Montero and Yang showed that in Mexico, agriculturally-coinciding festivals have negative effects on household income and other development outcomes. They also lead to lower agricultural productivity and higher share of the labor force in agriculture, consistent with agriculturally-coinciding festivals inhibiting the structural transformation of the economy. So why do these families persist on spending beyond their means? They showed that agriculturally-coinciding festivals also lead to higher religiosity and social capital, potentially explaining why such festivals persist in spite of their negative growth consequences.

I guess this is one of those cases where non-economic benefits outweigh the economic ones. These humans are still rational beings.



May 21, 2021

Replicating Economist Chart with Range Plot

In our recent paper, we expected government spending across countries to increase in 2020 as a response to the COVID pandemic, and to also fund programs that protect people, jobs, and businesses during the recession caused by the pandemic. Once economies start to bounce back in 2021, government spending is expected to fall for the next couple of years.

This has been analyzed by the Economist, and below I am replicating one of the charts they presented. For this one, I am using data from IMF's World Economic Outlook.


Codes are as follows:

keep if year>=2019 & year<=2021
keep if country=="Indonesia" | country=="Malaysia" | country=="Philippines" | country=="Vietnam"
egen low=min(debt),by(country)
egen high=max(debt),by(country)
reshape wide debt, i(countrylow high) j(year)
egen cty=group(country)
labmask cty, val(country)

#delimit ;
twoway
(rcapsym high low cty, horizontal lcolor(black) msym(none))
(scatter cty debt2019, msym(S) msize(large) mcolor(ebblue))
(scatter cty debt2020, msym(S) msize(large) mcolor(cranberry))
(scatter cty debt2021, msym(S) msize(large) mcolor(gold))
,
title("{bf:Governments wants to slow debt}" "{bf:expansion after jump in 2020}", justification(left) pos(11) span)
subtitle("(public debt as a % of GDP)", justification(left) pos(11) span)
ylab(1(1)4, valuelabel grid gmin gmax angle(0) noticks nogextend labsize(large))
ysca(lcolor(none) reverse)
ytitle("")
xlab(0(20)80, grid gmin gmax noticks labsize(large))
xsca(lcolor(none) alt)
xline(0, lcolor(black))
xsize(4)
plotregion(lcolor(none))
legend(order(2 "2019" 3 "2020" 4 "2021") row(1) pos(11) size(large) region(lcolor(none)) span)
note(" " "Source: IMF World Economic Outlook.", span size(medium))
;
#delimit cr

April 22, 2018

Stopping midway while running a do-file

If you're like me who always use a do-file even when running short analyses (e.g. just one chart! -- it helps if you always have a record of what you're doing; I even put a time stamp on mine), more often than not you want to see ONLY the result of some codes found in the middle of a do-file. For instance, in the middle of the do-file, you have codes for running a regression. If you just want to see the regression results without seeing the output, you have several options.

If you have more patience than me, you can just highlight only the set of codes you want to see the results of (the regression codes for instance) and then type Ctrl+D to run the codes. But maybe this is cumbersome because before running the regression, you may need to prepare the data first, and maybe generate new variables. So you would need to go back further and highlight a few more lines of codes.

An alternative would be not to highlight anything and just simply click Ctrl+D. Of course, this would run the whole do-file so an additional step for you is to look at the results window and search for the regression results. But with all the output of other analyses included in the do-file also presented in the results window, I'm sure you will soon start to think there should be an easier way to do this.

And that easier way is to use "/*". Putting these two characters after the codes for the analysis you want to see (the regression codes in my example) would suppress the results of the rest of the analyses in the do-file. Yes, you'll still see the results of the analyses at the earlier portion of the do-file (before the regression analysis in my example), but at least all you need to do is just look for the start of where the results are suppressed, and before that you will find the analysis you want to see (i.e., the regression results).

Then again, if you're like me, I hate that I have to scroll back up, especially if the do-file is horribly long.

This brings us to what I think is the best and simplest solution. Just use "exit". Write this command after the codes you want to see and when you type Ctrl+D without highlighting any codes, the do-file will run and end immediately after the output you are looking for is presented in the results window. No need for scrolling up and down. Simple as that.

exit

April 19, 2018

Changing x-axis labels in graph bar charts

The graph bar and twoway bar may seem like commands that generate the same thing: bar charts. But they are actually different, and one has advantages over the other. Similarly, one also has disadvantages over the other. You can easily use stacked charts with graph bar for example.

graph bar domgheche1 shiche1 oopche1 extche1 othche1 if country=="Philippines" ///
, ///
over(year, label(labsize(small))) ///
stack ///
outergap(75) ///
bar(1, color(ltkhaki)) ///
bar(2, color(khaki)) ///
bar(3, color(red)) ///
bar(4, color(ltblue)) ///
subtitle("Source of health expenditure, Philippines") ///
ytitle("Share (%)" " ") ///
legend(order(1 "Domestic" "government" 2 "SHI and" "compulsory" "prepayment" 3 "OOP" 4 "External" 5 "Other") pos(6) row(1) symxsize(6) size(small))


One of the problems you may encounter (especially if you chart multiple instances, years in this case, of the variables you are graphing), is getting a very crowded x-axis with the many labels (again in this case, years). One way to resolve this issue is that you can opt not to show all the labels and instead only a few. For example, you might want to show only 5-year labels (2000, 2005, 2010, and 2015). You can do this by using the relabel option (highlighted below).

graph bar domgheche1 shiche1 oopche1 extche1 othche1 if country=="Philippines" ///
, ///
over(year, relabel(2 " " 3 " " 4 " " 5 " " 7 " " 8 " " 9 " " 10 " " 12 " " 13 " " 14 " " 15 " ") label(labsize(small))) ///
stack ///
outergap(75) ///
bar(1, color(ltkhaki)) ///
bar(2, color(khaki)) ///
bar(3, color(red)) ///
bar(4, color(ltblue)) ///
nofill ///
subtitle("Source of health expenditure, Philippines") ///
ytitle("Share (%)" " ") ///
legend(order(1 "Domestic" "government" 2 "SHI and" "compulsory" "prepayment" 3 "OOP" 4 "External" 5 "Other") pos(6) row(1) symxsize(6) size(small))


Keep in mind that with graph bar, you are graphing over categorical values of the x-variable. So when relabeling, the bars are assigned the values 1, 2, 3, etc. from left to right. So you have to identify which number you are relabeling. In the example above encompassing 16 years, all categories of years are relabeled to missing except for 1 (for 2000), 6 (for 2005), 11 (for 2010), and 16 (for 2015).

Just to give you a background on the example above, the chart shows a breakdown of health expenditures by revenue source in the Philippines from 2000 to 2015. Health financing in any country can come from 5 revenue sources: (1) domestic government (a.k.a. tax revenue); (2) social health insurance and other compulsory prepayment; (3) out-of-pocket spending, or OOP; (4) external sources (international organizations and foreign governments); and (5) others (including other private sources). These charts are very important as part of monitoring trends of these sources, especially OOP. OOP is basically paying for health service directly "out of your pocket" right there and then. One of the tenets of Universal Health Coverage (UHC) is ensuring access to health care without facing financial hardship. Keep in mind, no one knows when you or your family member gets sick. So you will not only encounter a health shock, but also a financial shock if you have to pay for health care "out of your pocket."

Achieving UHC means that health care should either be financed by the government or through prepaid/pooling mechanisms such as SHI and other compulsory prepayment schemes. Using this chart, it is ideal that the OOP component (red bars) should be declining over time, while either domestic government or SHI (khaki bars) should be increasing over time. Are we seeing this in the Philippines? Unfortunately, not. We instead see the opposite.

The source of the data is World Health Organization's Global Health Expenditure Database.


Still fixing my blog

I've been busy with work at the World Bank so I have not gotten back to updating this blog. I didn't know that the images are down. I will work on fixing them in the next couple of days. And hopefully I can start blogging again.

July 7, 2016

Tabulate with both value and label

Remember when you tabulate, you get the frequency and some statics of the values already labeled (I'm using a data from IMF's Government Finance Statistics database):

. tab soc_src, m

Source - Social |
              Contributions |      Freq.     Percent        Cum.
----------------------------+-----------------------------------
             OECD | General |      1,111       25.38       25.38
  GFS01 | General | Accrual |        139        3.18       28.56
     GFS01 | General | Cash |        219        5.00       33.56
  GFS01 | Central | Accrual |          5        0.11       33.68
     GFS01 | Central | Cash |        297        6.79       40.46
     GFS86 | Central | Cash |         12        0.27       40.74
GFS01 | Budgetary | Accrual |         33        0.75       41.49
   GFS01 | Budgetary | Cash |        148        3.38       44.87
                          . |      2,413       55.13      100.00
----------------------------+-----------------------------------
                      Total |      4,377      100.00

Again the "m" option is so the missing observations are tabulated. Now sometimes when I do a table (or tabulate), I have to write the option "nol" to see the actual values and not the labels assigned to them:

. tab soc_src, nol

Source - |
     Social |
Contributio |
         ns |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      1,111       56.57       56.57
          2 |        139        7.08       63.65
          3 |        219       11.15       74.80
          4 |          5        0.25       75.05
          5 |        297       15.12       90.17
          6 |         12        0.61       90.78
          7 |         33        1.68       92.46
          8 |        148        7.54      100.00
------------+-----------------------------------
      Total |      1,964      100.00

Now why not both? I mean, what does the value "0" refer to again? How about "8"? Well if you use the command "FRE", written by Ben Jann, tabulation is much clearer:

. fre soc_src

soc_src -- Source - Social Contributions
-----------------------------------------------------------------------------------
                                      |      Freq.    Percent      Valid       Cum.
--------------------------------------+--------------------------------------------
Valid   0 OECD | General              |       1111      25.38      56.57      56.57
        2 GFS01 | General | Accrual   |        139       3.18       7.08      63.65
        3 GFS01 | General | Cash      |        219       5.00      11.15      74.80
        4 GFS01 | Central | Accrual   |          5       0.11       0.25      75.05
        5 GFS01 | Central | Cash      |        297       6.79      15.12      90.17
        6 GFS86 | Central | Cash      |         12       0.27       0.61      90.78
        7 GFS01 | Budgetary | Accrual |         33       0.75       1.68      92.46
        8 GFS01 | Budgetary | Cash    |        148       3.38       7.54     100.00
        Total                         |       1964      44.87     100.00           
Missing .                             |       2413      55.13                      
Total                                 |       4377     100.00                      
-----------------------------------------------------------------------------------

To download the program, just type:

ssc install fre

There was once another user-made program that does a similar thing called "TABL," but it's no longer available in the servers anymore.

Anyway, more later.

November 19, 2012

Using IVPROBIT: Do Farm Size and Farmer Attitudes Towards Planting Genetically-Modified Corn or Soybean Crops?

This is the question posed to us by Dr. Harvey James as an exercise. Using a survey of 3,000 agricultural producers that he and Dr. Mary Hendrickson did in 2006, we tested how farm size and attitudes toward genetically-modified organisms (GMO) affect a farmer's decision to plant GMO corn and/or soybean.

We limit the data to those that only produced corn and/or soybean crops because we want to focus only at those farmers. The main variables of interest are if the farmer has experienced planting a GMO crop (ADOPT) for the dependent variable, and log of farm size in acres (LNACRES) as well as positive attitude towards GMOs (ATTITUDE) as the main regressors. By the way, as a side note, it's always a good rule of thumb to use the log of a variable if the values of the variable are too large.

We also consider other control variables, such as age of the farmer (AGEGROUP), if the farmer also raises livestock (LIVESTOCK), and if the farmer attends church very often (CHURCH). It is expected that older farmers are less likely to adopt (rely more on traditional farming), that farmers who raise livestock are less likely to adopt, and that farmers who attend church more often are less likely to adopt (probably because more religous farmers view genetic manipulation negatively).

Automatically, we use binary choice models (probit in this case) as the econometric method. The results are as follows:



Well, except for CHURCH, we have the expected results. Older farmers and those that raise livestock are less likely to adopt GMOs. The more important results, however, are the first two regressors: the larger the farm, the more likely the farmer adopts; and (ho-hum) if a farmer has positive attitude towards GMOs, then he or she is more likely to plant GMOs.

Now here's the catch: there might be an endogeneity problem. If you are already planting GMOs, would you say that GMOs are not good for farmers? The answer is no. The fact that you are already adopting GMOs, the more likely you have a positive attitude toward GMOs.

So, for endogeneity problems, we use instrumental variables approach. It is more common to have instrumental variable regressions if the dependent variable is continuous.

What about binary variables such as ADOPT, which is a simple yes if you planted GMOs and no otherwise? Good thing there is also the instrumental variables probit model, which can easily be implemented in STATA with the IVPROB command.

Now, it is given that farmers who are optimistic about the future are likely to have a positive attitude towards GMO adoption, but that this optimism is totally unrelated to farmers' decisions of adopting GMOs or not. And so we use this OPTIMISM as the instrumental variable.

Now like all instrumental variable estimation, there are actually two methods: two-stage least squares estimation (using the TWOSTEP option in STATA) and the simultaneous maximum likelihood estimation (using the MLE option in STATA). We preferred the simplified, reduced-form approach and so we went for the two-stage method. The results are as follows:



Well, surprise surprise. The signs are still there, but only two remain as significant: size of farm and livestock. Age and attitude doesn't affect GMO adoption? We will have to take that with suspicion. And we may be correct. Because if you look at the Wald test for exogeneity (at the bottom of the table), the statistic is insignificant. This only means one thing: THERE IS NO ENDOGENEITY. Well, at least none for this sample. Remember, we just said there MAY be endogeneity. But looking at these results, there is actually none.

So given that, we can go back and report the probit result as the main one. Corn and/or soybean producers that have larger farms are more likely to adopt GM crops. Those producers that have positive attitudes towards GMOs are also more likely to adopt GM crops.

You'll have to put an additional "DUH" on that second one.

May 3, 2012

Put a Little Shade in your Graphs

Suppose you graph a time series and you want to highlight a certain range (for example, U.S. real GDP series and you want to color the area between 1929 to 1939 to indicate the years under the Great Depression):

line rgdp year

Well, a simple way to do that is to add another variable and use the twoway graph command "AREA." To start with, your new variable has to be a constant with value equal to the highest data point in your y-axis. So for example, in your graph of U.S. real GDP, the highest value given is 3,500 billions of constant 2005 $. So:

gen new=3500

Now, your new graph command should be:

twoway (area new year if year>=1929 & year<=1939) (line rgdp year)

You have to start with the AREA-type graph first because the first graph type will always be superimposed by the next graph type.