## Loading required package: plyr
## Loading required package: reshape2
## Loading required package: lmPerm
## Loading required package: PropCIs
## Loading required package: plotrix
America seems to be a country divided - along cultural, political, and religious lines to name a few. Or is it? This project attempts an investigation along one of those lines. It attempts to find an association between fear and location in the United States.
This project uses data from the GSS survey, a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States.
Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting.
Each survey from 1972 to 2012 was an independently drawn sample of English-speaking persons 18 years of age or over, living in non-institutional arrangements within the United States. Starting in 2006 Spanish-speakers were added to the target population. Block quota sampling was used in 1972, 1973, and 1974 surveys and for half of the 1975 and 1976 surveys. Full probability sampling was employed in half of the 1975 and 1976 surveys and the 1977, 1978, 1980, 1982-1991, 1993-1998, 2000, 2002, 2004, 2006, 2008, 2010 and 2012 surveys. Also, the 2004, 2006, 2008, 2010 and 2012 surveys had sub-sampled non-respondents.
The two variables of interest are fear and region. From the codebook:
Fear: afraid to walk at night in neighborhood (is there any area right around here - that is, within a mile - where you would be afraid to walk alone at night?)
Region: region of interview
This is an observational study - i.e. data collected in a way that does not interfere in how the data arises. The subjects are all noninstitutionalized, English and Spanish speaking persons 18 years of age or older, living in the United States. No causal link can be made as this is an observational study, although a (perhaps natural) association may be demonstrated. The results can be generalized to the population of interest. Non-response bias has been mitigated by conducting surveys outside of working hours, and over the weekends.
Take a look at the data.
table_fear <- table(data$fear, data$region);
addmargins(table_fear);
##
## New England Middle Atlantic E. Nor. Central W. Nor. Central
## Yes 603 2168 2376 821
## No 970 2885 4085 1676
## Sum 1573 5053 6461 2497
##
## South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
## Yes 2936 934 1401 693 2078
## No 3709 1375 1733 1338 2514
## Sum 6645 2309 3134 2031 4592
##
## Sum
## Yes 14010
## No 20285
## Sum 34295
The totals are not telling us much (apart from the data meeting inference conditions) so we convert the data to proportions.
prop_fear <- prop.table(table_fear, 2);
prop_fear;
##
## New England Middle Atlantic E. Nor. Central W. Nor. Central
## Yes 0.3833439 0.4290520 0.3677449 0.3287946
## No 0.6166561 0.5709480 0.6322551 0.6712054
##
## South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
## Yes 0.4418360 0.4045041 0.4470325 0.3412112 0.4525261
## No 0.5581640 0.5954959 0.5529675 0.6587888 0.5474739
There seems to be marked differences between many of the proportions. Next we visualize the proportions with a barplot and a mosaicplot.
barplot(prop_fear,
main = 'Fear By Region',
col = cm.colors(2),
legend = rownames(prop_fear),
cex.names = 0.75,
las = 2,
beside = TRUE);
mosaicplot(prop_region,
main = 'Fear By Region',
col = cm.colors(2),
las = 2);
There certainly seems to be some consistency, but also a marked difference between many of the proportions.
Perform a chi-square to test independence of Fear and Region. The hypothesis for the test are as follows:
The conditions for the chi-square test are as follows:
inference(data$fear,
data$region,
est = 'proportion',
type = 'ht',
method = 'theoretical',
success = 'Yes',
alternative = 'greater');
## Response variable: categorical, Explanatory variable: categorical
## Chi-square test of independence
##
## Summary statistics:
## x
## y New England Middle Atlantic E. Nor. Central W. Nor. Central
## Yes 603 2168 2376 821
## No 970 2885 4085 1676
## Sum 1573 5053 6461 2497
## x
## y South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
## Yes 2936 934 1401 693 2078
## No 3709 1375 1733 1338 2514
## Sum 6645 2309 3134 2031 4592
## x
## y Sum
## Yes 14010
## No 20285
## Sum 34295
## H_0: Response and explanatory variable are independent.
## H_A: Response and explanatory variable are dependent.
## Check conditions: expected counts
## x
## y New England Middle Atlantic E. Nor. Central W. Nor. Central
## Yes 642.59 2064.22 2639.41 1020.06
## No 930.41 2988.78 3821.59 1476.94
## x
## y South Atlantic E. Sou. Central W. Sou. Central Mountain Pacific
## Yes 2714.58 943.26 1280.28 829.69 1875.9
## No 3930.42 1365.74 1853.72 1201.31 2716.1
##
## Pearson's Chi-squared test
##
## data: y_table
## X-squared = 247.8821, df = 8, p-value < 2.2e-16
The results of the inference function give us a very small p-value (less than 2.2e-16). We can therefore reject the null hypothesis that Fear and Region are independent. Based on the results of the hypothesis test, if we plot the confidence intervals for each of the proportions side-by-side in ascending order, while we should see some overlap, we should see some confidence intervals that don’t overlap.
proportions <- vector();
ci_lowers <- vector();
ci_uppers <- vector();
for (i in c(4, 8, 3, 1, 6, 2, 5, 7, 9)) {
result <- prop.test(table_region[i, 1], n = sum(table_region[i, ]), alternative = 'two.sided');
proportions <- c(proportions, prop_region[i, 1]);
ci_lowers <- c(ci_lowers, prop_region[i, 1] - result$conf[1]);
ci_uppers <- c(ci_uppers, result$conf[2] - prop_region[i, 1]);
}
plotCI(proportions,
uiw = ci_uppers,
liw = ci_lowers,
ylim = c(0.3, 0.5),
main = 'Fear Proportions (With 95% CI)',
xlab = 'Regions',
ylab = 'Fear Proportion');
text(proportions, region_names, pos = 4, cex = 0.75, col = 'blue');
As expected we see that some confidence intervals don’t overlap, which agrees with the result of the chi-square test.
It seems clear that there is an association between fear and region. It might be interesting to group the regions according to the US Census Bureau designated regions - see List of regions of the United States
There could be other factors involved - cultural, political, etc. and further research could be done using the GSS dataset to see if similar associations are present by analyzing other variables (political inclination, education, gun ownership, etc.)
Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1
Persistent URL: http://doi.org/10.3886/ICPSR34802.v1
data[1:30, ];
## fear region
## 1614 No E. Nor. Central
## 1615 No E. Nor. Central
## 1616 Yes E. Nor. Central
## 1617 No E. Nor. Central
## 1618 No E. Nor. Central
## 1619 Yes Middle Atlantic
## 1620 Yes Middle Atlantic
## 1621 Yes Middle Atlantic
## 1622 Yes Middle Atlantic
## 1623 No Middle Atlantic
## 1624 No E. Nor. Central
## 1625 No E. Nor. Central
## 1626 No E. Nor. Central
## 1627 No E. Nor. Central
## 1628 No E. Nor. Central
## 1629 No E. Nor. Central
## 1630 Yes E. Nor. Central
## 1631 No E. Nor. Central
## 1632 Yes E. Nor. Central
## 1633 Yes Mountain
## 1634 Yes Mountain
## 1636 No Mountain
## 1637 No Mountain
## 1638 No E. Nor. Central
## 1639 Yes E. Nor. Central
## 1640 Yes E. Nor. Central
## 1641 No E. Nor. Central
## 1642 No E. Nor. Central
## 1643 No E. Nor. Central
## 1644 Yes E. Nor. Central