## Loading required package: plyr
## Loading required package: plotrix

### Introduction:

America seems to be a country divided - along cultural, political, and religious lines to name a few. Or is it? This project attempts an investigation along one of those lines. It attempts to find an association between fear and location in the United States.

### Data:

This project uses data from the GSS survey, a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States.

Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting.

Each survey from 1972 to 2012 was an independently drawn sample of English-speaking persons 18 years of age or over, living in non-institutional arrangements within the United States. Starting in 2006 Spanish-speakers were added to the target population. Block quota sampling was used in 1972, 1973, and 1974 surveys and for half of the 1975 and 1976 surveys. Full probability sampling was employed in half of the 1975 and 1976 surveys and the 1977, 1978, 1980, 1982-1991, 1993-1998, 2000, 2002, 2004, 2006, 2008, 2010 and 2012 surveys. Also, the 2004, 2006, 2008, 2010 and 2012 surveys had sub-sampled non-respondents.

The two variables of interest are fear and region. From the codebook:

Fear: afraid to walk at night in neighborhood (is there any area right around here - that is, within a mile - where you would be afraid to walk alone at night?)

Region: region of interview

This is an observational study - i.e. data collected in a way that does not interfere in how the data arises. The subjects are all noninstitutionalized, English and Spanish speaking persons 18 years of age or older, living in the United States. No causal link can be made as this is an observational study, although a (perhaps natural) association may be demonstrated. The results can be generalized to the population of interest. Non-response bias has been mitigated by conducting surveys outside of working hours, and over the weekends.

### Exploratory data analysis:

Take a look at the data.

    table_fear <- table(data$fear, data$region);
addmargins(table_fear);
##
##       New England Middle Atlantic E. Nor. Central W. Nor. Central
##   Yes         603            2168            2376             821
##   No          970            2885            4085            1676
##   Sum        1573            5053            6461            2497
##
##       South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
##   Yes           2936             934            1401      693    2078
##   No            3709            1375            1733     1338    2514
##   Sum           6645            2309            3134     2031    4592
##
##         Sum
##   Yes 14010
##   No  20285
##   Sum 34295

The totals are not telling us much (apart from the data meeting inference conditions) so we convert the data to proportions.

    prop_fear <- prop.table(table_fear, 2);
prop_fear;
##
##       New England Middle Atlantic E. Nor. Central W. Nor. Central
##   Yes   0.3833439       0.4290520       0.3677449       0.3287946
##   No    0.6166561       0.5709480       0.6322551       0.6712054
##
##       South Atlantic E. Sou. Central W. Sou. Central  Mountain   Pacific
##   Yes      0.4418360       0.4045041       0.4470325 0.3412112 0.4525261
##   No       0.5581640       0.5954959       0.5529675 0.6587888 0.5474739

There seems to be marked differences between many of the proportions. Next we visualize the proportions with a barplot and a mosaicplot.

    barplot(prop_fear,
main = 'Fear By Region',
col = cm.colors(2),
legend = rownames(prop_fear),
cex.names = 0.75,
las = 2,
beside = TRUE);

    mosaicplot(prop_region,
main = 'Fear By Region',
col = cm.colors(2),
las = 2);

There certainly seems to be some consistency, but also a marked difference between many of the proportions.

### Inference:

Perform a chi-square to test independence of Fear and Region. The hypothesis for the test are as follows:

• H0: Fear and Region are independent - Fear is not associated with Region
• HA: Fear and Region are dependent - Fear is associated with Region

The conditions for the chi-square test are as follows:

• Independence - met by virtue of the nature of the survey.
• the GSS is a random survey - respondents are picked at random
• clearly the number of observations is less than 10% of population (see table_fear above)
• each respondent contributes to one cell in the table
• Sample Size - each cell has greater than 5 cases.
    inference(data$fear, data$region,
est = 'proportion',
type = 'ht',
method = 'theoretical',
success = 'Yes',
alternative = 'greater');
## Response variable: categorical, Explanatory variable: categorical
## Chi-square test of independence
##
## Summary statistics:
##      x
## y     New England Middle Atlantic E. Nor. Central W. Nor. Central
##   Yes         603            2168            2376             821
##   No          970            2885            4085            1676
##   Sum        1573            5053            6461            2497
##      x
## y     South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
##   Yes           2936             934            1401      693    2078
##   No            3709            1375            1733     1338    2514
##   Sum           6645            2309            3134     2031    4592
##      x
## y       Sum
##   Yes 14010
##   No  20285
##   Sum 34295
## H_0: Response and explanatory variable are independent.
## H_A: Response and explanatory variable are dependent.
## Check conditions: expected counts
##      x
## y     New England Middle Atlantic E. Nor. Central W. Nor. Central
##   Yes      642.59         2064.22         2639.41         1020.06
##   No       930.41         2988.78         3821.59         1476.94
##      x
## y     South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
##   Yes        2714.58          943.26         1280.28   829.69  1875.9
##   No         3930.42         1365.74         1853.72  1201.31  2716.1
##
##  Pearson's Chi-squared test
##
## data:  y_table
## X-squared = 247.8821, df = 8, p-value < 2.2e-16

The results of the inference function give us a very small p-value (less than 2.2e-16). We can therefore reject the null hypothesis that Fear and Region are independent. Based on the results of the hypothesis test, if we plot the confidence intervals for each of the proportions side-by-side in ascending order, while we should see some overlap, we should see some confidence intervals that don’t overlap.

    proportions <- vector();
ci_lowers   <- vector();
ci_uppers   <- vector();

for (i in c(4, 8, 3, 1, 6, 2, 5, 7, 9)) {

result      <- prop.test(table_region[i, 1], n = sum(table_region[i, ]), alternative = 'two.sided');
proportions <- c(proportions, prop_region[i, 1]);
ci_lowers   <- c(ci_lowers, prop_region[i, 1] - result$conf[1]); ci_uppers <- c(ci_uppers, result$conf[2] - prop_region[i, 1]);
}

plotCI(proportions,
uiw = ci_uppers,
liw = ci_lowers,
ylim = c(0.3, 0.5),
main = 'Fear Proportions (With 95% CI)',
xlab = 'Regions',
ylab = 'Fear Proportion');

text(proportions, region_names, pos = 4, cex = 0.75, col = 'blue');

As expected we see that some confidence intervals don’t overlap, which agrees with the result of the chi-square test.

### Conclusion:

It seems clear that there is an association between fear and region. It might be interesting to group the regions according to the US Census Bureau designated regions - see List of regions of the United States

There could be other factors involved - cultural, political, etc. and further research could be done using the GSS dataset to see if similar associations are present by analyzing other variables (political inclination, education, gun ownership, etc.)

### References:

Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1

Persistent URL: http://doi.org/10.3886/ICPSR34802.v1

### Appendix:

    data[1:30, ];
##      fear          region
## 1614   No E. Nor. Central
## 1615   No E. Nor. Central
## 1616  Yes E. Nor. Central
## 1617   No E. Nor. Central
## 1618   No E. Nor. Central
## 1619  Yes Middle Atlantic
## 1620  Yes Middle Atlantic
## 1621  Yes Middle Atlantic
## 1622  Yes Middle Atlantic
## 1623   No Middle Atlantic
## 1624   No E. Nor. Central
## 1625   No E. Nor. Central
## 1626   No E. Nor. Central
## 1627   No E. Nor. Central
## 1628   No E. Nor. Central
## 1629   No E. Nor. Central
## 1630  Yes E. Nor. Central
## 1631   No E. Nor. Central
## 1632  Yes E. Nor. Central
## 1633  Yes        Mountain
## 1634  Yes        Mountain
## 1636   No        Mountain
## 1637   No        Mountain
## 1638   No E. Nor. Central
## 1639  Yes E. Nor. Central
## 1640  Yes E. Nor. Central
## 1641   No E. Nor. Central
## 1642   No E. Nor. Central
## 1643   No E. Nor. Central
## 1644  Yes E. Nor. Central