Properties, Limitations, Example Solved Problems - Karl Pearson’s Correlation Coefficient | 12th Statistics : Chapter 4 : Correlation Analysis

Chapter: 12th Statistics : Chapter 4 : Correlation Analysis

Karl Pearson’s Correlation Coefficient

When there exists some relationship between two measurable variables, we compute the degree of relationship using the correlation coefficient.

KARL PEARSON’S CORRELATION COEFFICIENT

When there exists some relationship between two measurable variables, we compute the degree of relationship using the correlation coefficient.

Co-variance

Let (X,Y) be a bivariable normal random variable where V(X) and V(Y) exists. Then, covariance between X and Y is defined as

cov(X,Y) = E[(X-E(X))(Y-E(Y))] = E(XY) – E(X)E(Y)

If (x_i,y_i), i=1,2, ..., n is a set of n realisations of (X,Y), then the sample covariance between X and Y can be calculated from

1. Karl Pearson’s coefficient of correlation

When X and Y are linearly related and (X,Y) has a bivariate normal distribution, the co-efficient of correlation between X and Y is defined as

This is also called as product moment correlation co-efficient which was defined by Karl Pearson.

Based on a given set of n paired observations (x_i,y_i), i=1,2, ... n the sample correlation co-efficient between X and Y can be calculated from

or, equivalently

2. Properties

1. The correlation coefficient between X and Y is same as the correlation coefficient between Y and X (i.e, r_xy = r_yx ).

2. The correlation coefficient is free from the units of measurements of X and Y

3. The correlation coefficient is unaffected by change of scale and origin.

Thus, if u_i = [x_i – A] /c and v_i = [y_i – B] /d with c ≠ 0 and d ≠ 0 i=1,2, ..., n

where A and B are arbitrary values.

Remark 1: If the widths between the values of the variabls are not equal then take c = 1 and d = 1.

Interpretation

The correlation coefficient lies between -1 and +1. i.e. -1 ≤ r ≤ 1

· A positive value of ‘r’ indicates positive correlation.

· A negative value of ‘r’ indicates negative correlation

· If r = +1, then the correlation is perfect positive

· If r = –1, then the correlation is perfect negative.

· If r = 0, then the variables are uncorrelated.

· If r ≥ 0.7 then the correlation will be of higher degree. In interpretation we use the adjective ‘highly’

· If X and Y are independent, then r_xy = 0. However the converse need not be true.

Example 4.1

The following data gives the heights(in inches) of father and his eldest son. Compute the correlation coefficient between the heights of fathers and sons using Karl Pearson’s method.

Solution:

Let x denote height of father and y denote height of son. The data is on the ratio scale.

We use Karl Pearson’s method.

Calculation

Heights of father and son are positively correlated. It means that on the average , if fathers are tall then sons will probably tall and if fathers are short, probably sons may be short.

Short-cut method

Let A = 68 , B = 69, c = 1 and d = 1

Note: The correlation coefficient computed by using direct method and short-cut method is the same.

Example 4.2

The following are the marks scored by 7 students in two tests in a subject. Calculate coefficient of correlation from the following data and interpret.

Solution:

Let x denote marks in test-1 and y denote marks in test-2.

There is a high positive correlation between test -1 and test-2. That is those who perform well in test-1 will also perform well in test-2 and those who perform poor in test-1 will perform poor in test- 2.

The students can also verify the results by using shortcut method.

3. Limitations of Correlation

Although correlation is a powerful tool, there are some limitations in using it:

1. Outliers (extreme observations) strongly influence the correlation coefficient. If we see outliers in our data, we should be careful about the conclusions we draw from the value of r. The outliers may be dropped before the calculation for meaningful conclusion.

2. Correlation does not imply causal relationship. That a change in one variable causes a change in another.

NOTE

1. Uncorrelated : Uncorrelated (r = 0) implies no ‘linear relationship’. But there may exist non-linear relationship (curvilinear relationship).

Example: Age and health care are related. Children and elderly people need much more health care than middle aged persons as seen from the following graph.

However, if we compute the linear correlation r for such data, it may be zero implying age and health care are uncorrelated, but non-linear correlation is present.

2. Spurious Correlation : The word ‘spurious’ from Latin means ‘false’ or ‘illegitimate’. Spurious correlation means an association extracted from correlation coefficient that may not exist in reality.

Tags : Properties, Limitations, Example Solved Problems , 12th Statistics : Chapter 4 : Correlation Analysis

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

12th Statistics : Chapter 4 : Correlation Analysis : Karl Pearson’s Correlation Coefficient | Properties, Limitations, Example Solved Problems