MBA5652 Columbia Southern Correlation and Regression Analysis & Sun Coast Data Set Correlation and Regression Analysis Using Sun Coast Data Set
Using the Sun Coast data set, perform a correlation analysis, simple regression analysis, and multiple regression analysis, and interpret the results.
Please follow the Unit V Scholarly Activity template here to complete your assignment.
You will utilize Microsoft Excel ToolPak for this assignment.
Example:
Correlation Analysis
Restate the hypotheses.
Provide data output results from Excel Toolpak.
Interpret the correlation analysis results
Simple Regression Analysis
Restate the hypotheses.
Provide data output results from Excel Toolpak.
Interpret the simple regression analysis results
Multiple Regression Analysis
Restate the hypotheses.
Provide data output results from Excel Toolpak.
Interpret the multiple regression analysis results.
The title and reference pages do not count toward the page requirement for this assignment. This assignment should be no less than two pages in length, follow APA-style formatting and guidelines, and use references and citations as necessary. UNIT V STUDY GUIDE
Data Analysis: Correlation
and Regression
Course Learning Outcomes for Unit V
Upon completion of this unit, students should be able to:
6. Differentiate between various research-based tools commonly used in businesses.
6.1 Determine the most appropriate statistical procedure to use from among correlation, simple
regression, and multiple regression to test hypotheses.
7. Test data for a business research project.
7.1 Establish whether to accept or reject null and alternative hypotheses by using correlation,
simple regression, and multiple regression.
Course/Unit
Learning Outcomes
6.1
7.1
Learning Activity
Unit Lesson
Video: How to Find Correlation in Excel with the Data Analysis Toolpak
Video: How to Use Excel-The PEARSON Function
Video: Excel 2016 Correlation Analysis
Video: How to Calculate a Correlation (and p value) in Microsoft Excel
Video: Correlation Coefficient in Excel
Video: How to Perform a Linear or Multiple Regression (Excel 2013)
Video: Multiple Regression Interpretation in Excel
Unit V Scholarly Activity
Unit Lesson
Video: Excel 2016 Correlation Analysis
Video: How to Calculate a Correlation (and p value) in Microsoft Excel
Video: Correlation Coefficient in Excel
Video: Multiple Regression Interpretation in Excel
Unit V Scholarly Activity
Reading Assignment
In order to access the following resources, click the links below:
Glen, S. (2013, December 14). How to find correlation in Excel with the Data Analysis Toolpak [Video file].
Retrieved from https://www.youtube.com/watch?v=AjQA78tI39Q
Click here for a transcript of the video.
TheRMUoHP Biostatistics Resource Channel. (2014, November 6). How to use Excel-The PEARSON
Function [Video file]. Retrieved from https://www.youtube.com/watch?v=JO-Gc5bEG70
Click here for a transcript of the video.
Porterfield, T. (2017, May 18). Excel 2016 correlation analysis [Video file]. Retrieved from
Click here for a transcript of the video.
MBA 5652, Research Methods
1
Quantitative Specialists. (2014, September 15). How to calculate a correlationUNIT
(and xp-value)
Microsoft
STUDYinGUIDE
Excel [Video file]. Retrieved from https://www.youtube.com/watch?v=vFcxExzLfZI
Title
Click here for a transcript of the video.
MrSnyder88. (2009, November 8). Correlation coefficient in Excel [Video file]. Retrieved from
Click here for a transcript of the video.
economistician.com. (2015, May 15). How to perform a linear or multiple regression (Excel 2013) [Video file].
Retrieved from https://www.youtube.com/watch?v=wBocR96UdyY
Click here for a transcript of the video.
TheWoundedDoctor. (2013, May 6). Multiple regression interpretation in Excel [Video file]. Retrieved from
Click here for a transcript of the video.
Unit Lesson
Data Analysis: Correlation and Regression
Unit IV discussed descriptive statistics and the importance of testing the data to ensure assumptions are met
before using parametric statistical procedures. When using descriptive statistics, the data that are collected
are described by the researcher both visually and statistically. The visual representation alone can reveal
information about whether assumptions are met. Although all statistical tests have different assumptions,
normality is universally shared and is relatively easy to observe through the use of histograms.
It is preferable to use parametric tests since they are more powerful than non-parametric tests, which have
fewer assumptions that must be met. Regardless of the statistical procedure under consideration, the
assumptions must be met if the researcher can have confidence in the validity of the results. Units V through
VII will focus on inferential statistics, which include the parametric tests of correlation, regression, t test, and
ANOVA.
Inferential Statistics
Unlike descriptive statistics, inferential statistics go beyond simply describing the data to making inferences,
or predictions, about a population. The inferences are often based on the characteristics of a sample.
Inferences, or predictions, are stated in the form of hypotheses. Results of statistical tests on samples are
used to generalize those results to a population (Zikmund, Babin, Carr, & Griffin, 2013). Descriptive statistics
and inferential statistics are not mutually exclusive. In fact, performing descriptive statistics should always be
a precursor to inferential statistics for assumption testing for statistical procedures being considered.
Populations, Samples, and Generalization
Statistical procedures are used to answer questions about a population. A population can be people or things,
such as a companys entire consumer base or the total units produced for a new product. A population can be
very large or very small. For example, a company may collect productivity data on their 100 employees. They
are interested in knowing if there is a relationship between the size of merit increases and job productivity.
The 100 employees represent the entire population, which would be considered a census. Since data are
collected from all 100 employees, the company can have certainty that the statistical results represent the
entire population. In many instances, however, it is impractical and cost prohibitive to collect data from all
participants in the population. In these scenarios, data is collected from a sample of the population. The
statistical results from the sample are then used to generalize the findings to the population. Using the
example above, now assume the company has a population of 200,000 employees. They decide to select a
random sample of 100 employees to whom they have provided various merit increases. Like the example
MBA 5652, Research Methods
2
above, their interest is to understand if there is a relationship between the sizeUNIT
of merit
increases
and job
x STUDY
GUIDE
productivity. If they determine that there is a statistically significant relationshipTitle
between the size of merit
increases and productivity, they can generalize those results to the population of 200,000 employees. This
can inform their decision-making and planning regarding the size of raises to provide for the next fiscal year
and the productivity increase they can forecast. This is the function of inferential statistics.
Relationships or Differences
Statistical analysis can be simplified as either looking for relationships (or associations) between variables or
looking for differences between variables or groups. This unit considers statistical testing that looks for
relationships between variables. The statistical procedures highlighted to test for relationships will be
correlation, simple regression, and multiple regression. Correlation and regression analyses are parametric
tests. Chi-square is a corresponding non-parametric test.
Correlation
Although many course concepts in research methods may be new and foreign, correlation may feel more
familiar and comfortable. The concept of correlation makes intuitive sense to most people since relationships
between variables (e.g., years of education and income, safety training hours and lost time hours, and hours
of exercise and weight loss) occur frequently in daily life. Relationships naturally occurring between variables
can be positive or negative. A positive or negative relationship between variables does not mean positive or
negative in the context of making a value judgment of good or bad. A positive or negative relationship, in
statistical terms, means the direction of the relationship.
An example of a positive relationship between variables is durable goods orders and the S&P 500 index.
When durable goods orders decrease, there is a decrease in the S&P 500 index. When durable goods orders
increase, there is an increase in the S&P 500 index. This is a positive relationship because both variables
move in the same direction. As one variable increases, the other increases. Conversely, when one variable
decreases, the other decreases.
An example of a negative relationship between variables is outdoor temperature and heating oil expenditures.
When the outdoor temperature increases, heating oil expenditures decrease. When the outdoor temperature
decreases, heating oil expenditures increase. This is a negative relationship because the variables move in
opposite directions. As one variable increases, the other decreases. Conversely, when one variable
decreases, the other increases.
Another important distinction that must be understood is the difference between correlation and causation.
Even if a statistical test (e.g. Pearsons r) indicates a statistically significant relationship between variables, it
must never be said that one variable causes the change in the other variable. For example, there is a positive
correlation between ice cream sales and violent crime in New York City (both increase in the warmer months
of the year, and both decrease in the cooler months). It would be absurd to say that ice cream causes violent
crimeeven though the relationship between variables does exist. This extreme example makes the point
that correlation does not mean causation. Causation can only be statistically shown via experimental research
designs, which have tight controls to manipulate variables.
Pearson Correlation Coefficient (r)
When conducting correlation analysis, the Pearson correlation coefficient (r) is the most commonly used
parametric measure of association between two variables (Norusis, 2008). The Pearson statistic is
represented by r, which is the standardized covariance between the variables, and measures the linear
relationship between variables (Field, 2005). The Pearson correlation coefficient is sometimes represented by
R, but this is normally used in the context of regression analysis. One can easily determine how to calculate r
using long-hand by referring to a statistics textbook, but it is much easier and faster to use statistical software
to quickly calculate the Pearson correlation coefficient. For the purposes of this course, it is most important to
understand what Pearsons r is, what it measures, and how to interpret it, rather than how to calculate it by
long-hand.
MBA 5652, Research Methods
3
When using correlation analysis, a hypothesis is tested that there is no statistically
UNITsignificant
x STUDY relationship
GUIDE
between variables. The null and alternative hypotheses would be stated like so.
Title
Ho1: There is no statistically significant relationship between X and Y.
Ha1: There is a statistically significant relationship between X and Y.
As mentioned above, the r statistic can indicate a positive relationship or a negative relationship between
variables. The r statistic can also indicate no relationship at all between variables. An r of +1 indicates a
perfect positive correlation, while an r of -1 indicates a perfect negative correlation (Field, 2005). The r statistic
will always fall between +1 and -1. An r of 0 indicates no correlation exists between variables.
Correlation
When reviewing the literature for research articles, it is very common to find r statistics less than .5. Given the
fact that an r of 1 indicates a perfect correlation, a statistically significant r of .5 or less hardly seems large
enough to get excited about; however, the American Psychological Association would disagree.
The American Psychological Association (as cited in Kerr, Garvin, Heaton, & Boyle, 2006) concluded that
psychologists studying highly complex human behavior should be satisfied with correlations in the r = 0.10 to
0.20 range, and they should be generally pleased with correlations in the 0.250.35 area. The best new
variables typically increase predictions, for instance, of job performance between 1% and 4%. A 10%
contribution of emotional intelligence would be considered very large (Kerr et al., 2006).
Although there are no concrete guidelines for interpreting r and R2, The following chart suggests some
general guidelines that are fairly consistent with other rule-of-thumb published guidelines.
Adapted from Guideline for Interpreting Correlation Coefficient by I. Phanny, 2014.
(https://www.slideshare.net/phannithrupp/guideline-for-interpreting-correlation-coefficient/2).
MBA 5652, Research Methods
4
Coefficient of Determination (R2)
UNIT x STUDY GUIDE
Title
The Pearsons r is useful itself, but the closely related coefficient of determination (R2) is also very
informative. Simply squaring r produces R2, which indicates the amount of variability in one variable that is
explained by the other variable (Field, 2005). According to the American Psychological Association (as cited
in Kerr et al., 2006), a researcher should be generally pleased with a correlation of r = .25, which translates to
a coefficient of determination R2 = .0625. This means that the variable x explains 6.25% of the variability in
the variable y. Most statistical software programs will calculate both r and R2 for when running correlation
analysis, so it is easy to see the strength of the association and the explained variance. Again, it is important
not to confuse correlation with causation.
Examples of r and R2:
r = .10, R2 = .01 explains 1% of the total variance between the variables being tested
r = .30, R2 = .09 explains 9% of the total variance between the variables being tested
r = .50, R2 = .25 explains 25% of the total variance between the variables being tested
Interpreting Correlation Output Results
The following correlation analysis looked for a statistically significant relationship between the variables of
height and weight. The results show that there is a moderately strong correlation r = .6 (Pearsons
Correlation). It is also necessary to assess whether the correlation is statistically significant using an alpha of
.05. The results indicate a p value of .023
Economic Debate- Progressive Income Tax For this Economic Debate, we are going to discuss the…
TOPIC: Going Global Discussion Thread 1 (initial post due Wednesday for full credit) Please note:…
Assignment Topic This week will culminate in the creation of a narrated PowerPoint to create…
The Assignment must be submitted on Blackboard (WORD format only) via allocated folder. Assignments submitted…
you need to post your 2-page information flier to share with your Final Project Group.…
discussion: Discuss the methods used at your company to measure and ensure quality products and…