In this question, we will continue to use the data we used in assignment 2. The dataset includes 184 US cities. 21 variables in the dataset, contain information ranging from city sustainability policy actions to population health status, along with local demographic information. The explanation for each variable is on sheet 2.
In the assignment 2, you picked one research question from three below and answered a few questions regarding the properties of dependent variable and independent variable in the question you picked.
- Does the city with higher level biking usage have less people diagnosed with physical health problems?
Review your answer in assignment 2 first and answer the following question:(5 points)
For those who picked question I or II, based on the question you selected, please conduct simple regression analysis by Excel and report the results:
- Report the value of intercept and regression coefficient
The value of the intercept in the regression of the above variables is shown in the coefficient table in the analysis. The intercept X has a lower 95% value of 0.970889232 and upper 95% value of 2.9672783 from the table. The coefficient intercept of X is 1.96908377. The intercept value shows the range of values where the independent variable.
The value of the regression coefficient is 0.084747 as shown in the coefficient table. The regression coefficient is attached to variable 1 as it is the variable that is used to predict the dependent variable in the regression analysis.
- State the regression line equation
, where Y is the dependent variable, X is the independent, a, is the intercept, b is the slope, and u is the regression model of the F value.
- Based on the equation, state how the dependent variable is going to be changed if there is one unit change in the independent variable
In the analysis, the X is the independent variable, and Y is the dependent variable. If the value of X changes by one unit, the dependent variable Y will change abundant by more units as it sums the product of independent with the slope, regression F value model, and the intercept of the regression model. From the equation, the one unit change in the X value will lead to a value of 6.36016757 from the regression line equation.
- According to the statistical results, is the relationship significant?
The p-value of 0.000139 is less than the significance value of 0.05, which shows that the results of the data provided have sufficient evidence to indicate that there is a significance between the dependent and the independent variables in the regression analysis. The p-value is less than significant value, which rejects the null hypothesis and the alternative hypothesis that states that there are significant relationships between the independent variable that is the high-level biking usage and the dependent variable that is the physical health problems.
- Interpret R2
The R-squared is the fitness of the measure of the linear regression model. It indicates the variance percentage in the dependent variable that the independent variable can collectively explain. It measures the strength of the relationship between the variables in the model. It is the coefficient of determination in a regression analysis. The value of R squared is 0.02311428, which shows that the strength of the dependent variable is approximately 2.31% on the independent variable in the linear regression model. The low value of R squared since it ranges from 0% to 100% does not have harm to the relationship between the variables. The simple linear regression uses R squared in determination of the strength between the variables.
- Think about one additional independent variable from the dataset as the control variable, use one sentence or two to justify why you think the additional variable could jointly explain the dependent variable.
The control variable in the above analysis from the dataset provided is the biking walking trails. The variable is whether the city has implemented policy to expand biking and walking trails. Since the analysis rejects null hypothesis and accepts the alternative hypothesis, it shows that there is a relationship between the independent variable that is the high-level biking usage and the dependent variable that is the physical health problems. The control variable is chosen to determine whether the city has taken any measures after determining that the high level of biking has reduced the diagnosis of physical health problems.
- Then, conduct multiple regression analysis and report the findings (regression line equation, state how the dependent variable is going to be changed if there is one unit change in the independent variable, holding constant the impact of another independent variable; are the relationships significant? Interpret R2 and F test)
The model summary of the multiple regression analysis is shown in the table below. The model value of multiple R is 0.0539082, the value of R squared is 0.0029061, the value of the adjusted R squared is -0.002572, and the standard error of the estimation is 1.44478809 in the model summary of the regression analysis. The analysis observation had 184 variables in the test.
Table 4
Regression Statistics
Regression Statistics | |
Multiple R | 0.0539082 |
R Square | 0.0029061 |
Adjusted R Square | -0.002572 |
Standard Error | 1.4478809 |
Observations | 184 |
The ANOVA of the regression analysis of the variables is done using Excel as shown in the table below. The tables show the regression and residual of the variables. The sum of squares and mean squares is also indicated. The value of F is 0.53045, and the significance of F value is 0.46353602 from the regression analysis.
Table 5
Regression Analysis
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 1 | 1.112014 | 1.112014 | 0.53045 | 0.467353602 |
Residual | 182 | 381.5373 | 2.096359 | ||
Total | 183 | 382.6493 |
The coefficient of the regression analysis on the intercept X variable and other variables are shown in the table below. It indicates the standard error, t statistical values, p-value, and the range of coefficients of the variables. The value of p-value for the intercept X variable is 0.012964, and the other variable is 0.467354. The X intercept coefficient is 0.07416667, and the slope is 0.2308333.
Table 6
Coefficient of the Regression Analysis
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 0.7416667 | 0.295547 | 2.509467 | 0.012964 | 0.158526714 | 1.324807 | 0.158527 | 1.324807 |
X Variable 1 | 0.2308333 | 0.316939 | 0.72832 | 0.467354 | -0.394514598 | 0.856181 | -0.39451 | 0.856181 |
State the Regression Equation
The regression model, after adding a control variable, becomes a multiple regression. The biking trails is the control variable in the regression model. The model equation becomes:
, where Y is the dependent variable, X is the independent, a, is the intercept, b is the slope, and u is the regression model of the F value.
Based on the Equation State How the Dependent Variable Is Going to Be Changed if There Is One Unit Change in the Independent Variable
In the analysis, the X is the independent variable, and Y is the dependent variable. If the value of the X changes by one unit, the dependent variable Y will change abundant by more units as it sums the product of independent with the slope, regression F value model, and intercept of the regression model. From the equation, the one unit change in the X value will lead to a value of 0.83545 from the regression model equation.
According to the Statistical Results, Is the Relationship Significant?
The p-value of 0.012964 is less than the significance value of 0.05, which demonstrates that the results of the data provided have sufficient evidence to show that there is significance between the dependent and independent variables in the regression analysis. The p-value that is less than significant value rejects the null hypothesis and the alternative hypothesis that states that there is a relationship between the independent variable that is the high-level biking usage and the dependent variable that is physical health problems. The control variable of the city implementing policy to expand biking and walking trails is used to show the improvements.
Interpret R Squared
The R-squared is the fitness of the measure of the linear regression model. It indicates the variance percentage in the dependent variable that the independent variable can collectively explain. It measures the strength of the relationship between the variables in the model. It is the coefficient of determination in a regression analysis. The value of R squared is 0.0029061, which shows that the strength of the dependent variable is approximately 0.29% on the independent variable in the linear regression model. The low value of R squared since it ranges from 0% to 100% does not have harm to the relationship between the variables.
Interpret F-Test
The F-test is the overall significance of the measure of the linear regression model. It indicates the variance percentage in the dependent variable that the independent variable can collectively explain. It measures the strength of the relationship between the variables in the model. It is the coefficient of determination in a regression analysis. It is a test that evaluates the multiple models in a regression analysis. Comparing the p-value of 0.012964 and the F value of 0.53045, it shows that the F value is greater than the p-value, which is less than the significant value of 0.05. The comparison provides enough evidence on the significance of the dependent and independent variables under the control variable. The above shows that after the proof that the higher-level biking usage has fewer people diagnosed with physical health problems, cities should implement policies to expand biking paths.