sleptim1
) impact physical (physhlth
) and mental (menthlth
)health?Research quesion 1: We compare sleep with bad physical and mental health days.
#Reinitialize data
data1 <- as_tibble(brfss2013)
#Data for Ohio (sleep 9 or less is selected since most individuals for under those categories)
data_ohio <- data1 %>%
filter(X_state=="Ohio" & sleptim1 <= 9 & physhlth!=0 & menthlth!=0) %>%
mutate(sleep_categories = cut(sleptim1, breaks = c(0,3,5,7,9),
labels = c("0-3","3-5","5-7","7-9")))
#Drop NAs
datamain_plot <- data_ohio[!is.na(data_ohio$physhlth), ]
datamain_plot <- datamain_plot[!is.na(datamain_plot$menthlth), ]
#Make boxplots
health1 <- c("physhlth", "menthlth")
p <- lapply(health1, function(x){
datamain <- data.frame("y_axis" = datamain_plot[[x]], "x_axis" = datamain_plot$sleep_categories)
ggplot(datamain, aes(y=y_axis,x=x_axis))+
geom_boxplot(notch=F, show.legend = F, outlier.shape = NA, aes(fill = factor(x_axis)))+
labs(title = paste(ifelse(x=="physhlth","Physical health","Mental health")),
x = "Sleep time (hours)",
y = paste(ifelse(x=="physhlth","Bad physical health days","Bad mental health days")))+
scale_fill_manual(values = rev(brewer.pal(4,"Blues")))+
theme(plot.title = element_text(hjust=0.5, face = "bold"),
text = element_text(size=8, face = "bold"),
axis.text.x = element_text(size = 8, face = "plain"))+
stat_summary(fun=mean, geom="point", shape=23, size=2)
})
#Plot them in a grid
plot_grid(plotlist = p)
#Create summary tables
tl <- lapply(health1, function(x){
datamain <- data.frame("y_axis" = datamain_plot[[x]], "x_axis" = datamain_plot$sleep_categories)
datamain_sum <- datamain %>%
group_by(x_axis) %>%
summarise_all(list(mean="mean", sd="sd", n="length"))
colnames(datamain_sum)<- c("Sleep Categories","Mean","Sd","N")
datamain_sum <- cbind(datamain_sum[,1],round(datamain_sum[,2:4],2))
print(kable(datamain_sum, format = "html",caption = paste(ifelse(x=="physhlth","Physical health summary",
"Mental health summary")),
escape = F) %>%
column_spec(1) %>%
kable_styling(full_width = F, latex_options = c("striped","hover", "condensed"),
font_size = 12.5) %>%
row_spec(0, font_size=10, color = "white", background = "grey"))
})
Sleep Categories | Mean | Sd | N |
---|---|---|---|
0-3 | 19.09 | 10.56 | 77 |
3-5 | 16.77 | 11.34 | 448 |
5-7 | 11.44 | 11.02 | 900 |
7-9 | 14.26 | 11.72 | 457 |
Sleep Categories | Mean | Sd | N |
---|---|---|---|
0-3 | 21.34 | 9.94 | 77 |
3-5 | 17.25 | 11.01 | 448 |
5-7 | 11.50 | 10.44 | 900 |
7-9 | 11.95 | 10.55 | 457 |
In the above boxplots, the line in the box specifies the median and the diamond box represents the mean. Please note that only individuals who reported as having at least one bad physical (physhlth
) and mental (menthlth
) day per year were included in the analysis. We observe that lower sleep leads to higher mental and physical problems. While the variances are higher for sleep categories, medians of both mental and physical health is above 10 for individuals getting less than five hours of sleep. Individuals earning less than 3 hours of sleep cab expect a median of 20 bad physical days and more than 25 bad mental days. Interestingly, we observe that people sleeping more that 7 hours are also in some cases likely to witness health problems.
Research quesion 2:
In this question, we examine if exercise level is tied to health outcomes i.e. are individuals who exercise more are likely to have better health? Note that we cannot establish causality, since health outcomes are tied to several indicators besides education. We use exerany2
(exercise in past 30 days) and compare it with physhlth
and menthlth
variables.
#Reinitialize data
data1 <- as_tibble(brfss2013)
#Data for Ohio (sleep 9 or less is selected since most individuals for under those categories)
data_ohio <- data1 %>%
filter(X_state=="Ohio")
#Drop NAs
datamain_plot <- data_ohio[!is.na(data_ohio$physhlth), ]
datamain_plot <- datamain_plot[!is.na(datamain_plot$menthlth), ]
datamain_plot <- datamain_plot[!is.na(datamain_plot$exerany2), ]
#Make boxplots
health1 <- c("physhlth", "menthlth")
p <- lapply(health1, function(x){
datamain <- data.frame("y_axis" = datamain_plot[[x]], "x_axis" = datamain_plot$exerany2)
ggplot(datamain, aes(y=y_axis,x=x_axis))+
geom_boxplot(notch=F, show.legend = F, outlier.shape = NA, aes(fill = factor(x_axis)))+
labs(title = paste(ifelse(x=="physhlth","Physical health","Mental health")),
x = "Exercise in past 30 days",
y = paste(ifelse(x=="physhlth","Bad physical health days","Bad mental health days")))+
theme(plot.title = element_text(hjust=0.5, face = "bold"),
text = element_text(size=8, face = "bold"),
axis.text.x = element_text(size = 8, face = "plain"))+
stat_summary(fun=mean, geom="point", shape=23, size=2)
})
#Plot them in a grid
plot_grid(plotlist = p)
#Create summary tables
tl <- lapply(health1, function(x){
datamain <- data.frame("y_axis" = datamain_plot[[x]], "x_axis" = datamain_plot$exerany2)
datamain_sum <- datamain %>%
group_by(x_axis) %>%
summarise_all(list(mean="mean", sd="sd", n="length"))
colnames(datamain_sum)<- c("Exercise in past 30 days","Mean","Sd","N")
datamain_sum <- cbind(datamain_sum[,1],round(datamain_sum[,2:4],2))
print(kable(datamain_sum, format = "html",caption = paste(ifelse(x=="physhlth","Physical health summary",
"Mental health summary")),
escape = F) %>%
column_spec(1) %>%
kable_styling(full_width = F, latex_options = c("striped","hover", "condensed"),
font_size = 12.5) %>%
row_spec(0, font_size=10, color = "white", background = "grey"))
})
Exercise in past 30 days | Mean | Sd | N |
---|---|---|---|
Yes | 3.3 | 7.69 | 7539 |
No | 7.5 | 11.28 | 3202 |
Exercise in past 30 days | Mean | Sd | N |
---|---|---|---|
Yes | 2.95 | 7.14 | 7539 |
No | 5.29 | 9.78 | 3202 |
While the median bad physical/mental for both yes
and no
is zero, we do observe higher variance for individuals not exercising and witnessing bad physical/mental days. The visualizations and summary tables confirm that individuals not exercising are more likely to have more bad physical and mental days.
Research quesion 3:
Next we test the relation between income level and physical and mental health. The intuition follows from the fact that more financially sound individuals should have better physical and mental health outcomes. This may due to indirect effects such as an individual’s ability to afford better quality food, housing and other amenities leading to a comfortable and healthy lifestyle.
#Reinitialize data
data1 <- as_tibble(brfss2013)
#Data for Ohio (sleep 9 or less is selected since most individuals for under those categories)
data_ohio <- data1 %>%
filter(X_state=="Ohio") %>%
mutate(inc_cat = income2) %>%
mutate(inc_cat = replace(inc_cat, inc_cat == "Less than $10,000"|inc_cat == "Less than $15,000"|
inc_cat == "Less than $20,000"|inc_cat == "Less than $25,000",
"Less than $25,000")) %>%
mutate(inc_cat = replace(inc_cat, inc_cat == "Less than $35,000"|inc_cat == "Less than $50,000", "Less than $50,000"))
#Drop NAs
datamain_plot <- data_ohio[!is.na(data_ohio$physhlth), ]
datamain_plot <- datamain_plot[!is.na(datamain_plot$menthlth), ]
datamain_plot <- datamain_plot[!is.na(datamain_plot$inc_cat), ]
#Make boxplots
health1 <- c("physhlth", "menthlth")
p <- lapply(health1, function(x){
datamain <- data.frame("y_axis" = datamain_plot[[x]], "x_axis" = datamain_plot$inc_cat)
ggplot(datamain, aes(y=y_axis,x=x_axis))+
geom_boxplot(notch=F, show.legend = F, outlier.shape = NA, aes(fill = factor(x_axis)))+
labs(title = paste(ifelse(x=="physhlth","Physical health","Mental health")),
x = "Income",
y = paste(ifelse(x=="physhlth","Bad physical health days","Bad mental health days")))+
scale_x_discrete(labels = function(x) lapply(strwrap(x, width = 10, simplify = FALSE), paste, collapse="\n"))+
scale_fill_manual(values = rev(brewer.pal(7,"Blues")))+
theme(plot.title = element_text(hjust=0.5, face = "bold"),
text = element_text(size=8, face = "bold"),
axis.text.x = element_text(size = 8, face = "plain"))+
stat_summary(fun=mean, geom="point", shape=23, size=2)
})
#Plot them in a grid
plot_grid(plotlist = p)
#Create summary tables
tl <- lapply(health1, function(x){
datamain <- data.frame("y_axis" = datamain_plot[[x]], "x_axis" = datamain_plot$inc_cat)
datamain_sum <- datamain %>%
group_by(x_axis) %>%
summarise_all(list(mean="mean", sd="sd", n="length"))
colnames(datamain_sum)<- c("Income Categories","Mean","Sd","N")
datamain_sum <- cbind(datamain_sum[,1],round(datamain_sum[,2:4],2))
print(kable(datamain_sum, format = "html",caption = paste(ifelse(x=="physhlth","Physical health summary",
"Mental health summary")),
escape = F) %>%
column_spec(1) %>%
kable_styling(full_width = F, latex_options = c("striped","hover", "condensed"),
font_size = 12.5) %>%
row_spec(0, font_size=10, color = "white", background = "grey"))
})
Income Categories | Mean | Sd | N |
---|---|---|---|
Less than $25,000 | 7.73 | 11.29 | 3204 |
Less than $50,000 | 3.78 | 8.27 | 2781 |
Less than $75,000 | 2.93 | 7.21 | 1583 |
$75,000 or more | 1.79 | 5.39 | 2438 |
Income Categories | Mean | Sd | N |
---|---|---|---|
Less than $25,000 | 6.41 | 10.49 | 3204 |
Less than $50,000 | 2.89 | 7.00 | 2781 |
Less than $75,000 | 2.53 | 6.48 | 1583 |
$75,000 or more | 1.70 | 5.14 | 2438 |
The visualizations and the summary table show that individuals earning below $25,000 are more likely to have bad physical and mental days. Thus, suggesting that lower income can lead to poor health outcomes. Conversely, individuals earning more than $75,000 generally are free from bad physical and mental days.
We have done an exploratory analysis on three questions, which examine individual relationships between a response and explanatory variable. While our analysis, shows promising trends, we must be aware of some shortcomings. Note that health outcomes are a function of several co-variates, thus it is difficult to draw causal conclusions from our analysis. A proper predictive model of health outcomes would require multiple linear regression.