CHI-SQUARE DISTRIBUTION AND ITS APPLICATIONS
Karl Pearson (1857-1936) was a English Mathematician and
Biostatistician. He founded the world’s first university statistics department
at University College, London in 1911. He was the first to examine whether the
observed data support a given specification, in a paper published in 1900. He
called it ‘Chi-square goodness of fit’ test which motivated research in
statistical inference and led to the development of statistics as separate
discipline.
The square of standard normal variable is known as a chi-square
variable with 1 degree of freedom (d.f.). Thus
If X ~ N (µ, σ2), then it is known that ~ N 0,1 . Further Z2 is said to follow χ2 – distribution with 1 degree of freedom (χ2 – pronounced as chi-square)
Note: i) If Xi ~ N (µ, σ2) , i = 1, 2, …, n are n iid random variables, then
It is a continuous distribution.
·
The distribution has only one parameter i.e. n d.f.
·
The shape of the distribution depends upon the d.f, n.
·
The mean of the chi-square distribution is n and variance 2n
·
If U and V are independent random variables having χ2
distributions with degree of freedom n1 and n2
respectively, then their sum U + V has the same χ2
distribution with d.f n1 + n2.
To test the variance of the normal population,
using the statistic in note (ii)
·
To test the independence of attributes.
·
To test the goodness of fit of a distribution.
·
The sampling distributions of the test statistics used in the last
two applications are approximately chi-square distributions.
Procedure
Step 1 : Let µ and σ2 be
respectively the mean and the variance of the normal population under study,
where σ2 is known and µ unknown. If σ02
is an admissible value of σ2, then frame the
Null hypothesis as H0: σ
2 = σ 02 and
choose the suitable alternative hypothesis from
(i) H1 : σ2
≠σ02 (ii) H1
: σ2 > σ02 (iii) H1 : σ2 < σ02
Step 2 : Describe the sample/data and its descriptive measures. Let (X1,
X2, …, Xn) be a random sample
of n observations drawn from the population, where n is small (n
< 30).
Step 3 : Fix the desired level of significance α.
Step 4 : Consider the test statistic under H0. The approximate sampling distribution of the test statistic under H0 is the chi-square distribution with (n–1) degrees of freedom.
Step 5 : Calculate the value of the of χ2 for the given sample as
Step 6 : Choose the critical value of χe2
corresponding to α and H1 from
the following table.
Step 7 : Decide on H0 choosing the suitable
rejection rule from the following table corresponding to H1.
Example 2.6
The weights (in kg.) of 8 students of class VII are 38, 42, 43,
50, 48, 45, 52 and 50. Test the hypothesis that the variance of the population
is 48 kg, assuming the population is normal and µ is unknown.
Solution:
Step 1 : Null Hypothesis H0: σ2 = 48 kg.
i.e. Population variance can be regarded as 48 kg.
Alternative hypothesis H1: σ2 ≠48 kg.
i.e. Population variance cannot be regarded as 48 kg.
Step 2 : The given sample information is
Sample size (n)= 8
Step 3 : Level of significance
α= 5%
Step 4 : Test statistic
Under null hypothesis H0
follows chi-square distribution with (n–1) d.f.
Step 5 : Calculation of test statistic
The value of chi-square under H0 is calculated
as under:
To find and sample variance s2,
we form the following table.
Step 6 : Critical values
Since H1 is a two sided alternative, the
critical values at α =0.05 are χ27, 0.025
= 16.01 and χ27,0.975 = 1.69.
Step 7 : Decision
Since it is a two-tailed test, the elements of the
critical region are determined by the
rejection rule 1.69 = χ27, 0.975 < χ02 (=3.375) < χ27,0.025 = 16.01.
For the given sample information, the rejection
rule does not hold, since
1.69 = χ27,
0.975 < χ02
(=3.375) < χ27,0.025
= 16.01.
Hence, H0
is not rejected in favour of H1.
Thus, Population variance can be regarded as 48 kg.
A normal population has mean µ (unknown) and variance 9. A
sample of size 9 observations has been taken and its variance is found to be
5.4. Test the null hypothesis H0: σ2 = 9
against H1: σ2 > 9 at 5% level of
significance.
Step 1 : Null Hypothesis H0: σ2 = 9.
i.e., Population variance regarded as 9.
Alternative hypothesis H1: σ2 > 9.
i.e. Population variance is regarded as greater than 9.
Step 2 : Data
Sample size (n) = 9
Sample variance (s2) = 5.4
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
Under null hypothesis H0
follows
chi-square distribution with (n-1)
degrees of freedom.
Step 5 : Calculation of test statistic
The value of chi-square under H0 is calculated
as
Step 6 : Critical value
Since H1 is a one-sided alternative, the
critical values at α =0.05 is χe2 = χ28,
0.05 = 15.507.
Step 7 : Decision
Since it is a one-tailed test, the elements of the critical region
are determined by the rejection rule χ02 > χe2.
For the given sample information, the rejection rule does not hold , since χ02 = 4.8 < χ28, 0.05 = 15.507. Hence, H0 is not rejected in favour of H1. Thus, the population variance can be regarded as 9.
A normal population has mean µ (unknown) and variance
0.018. A random sample of size 20 observations has been taken and its variance
is found to be 0.024. Test the null hypothesis H0: σ2
= 0.018 against H1: σ2 <
0.018 at 5% level of significance.
Step 1 : Null Hypothesis H0: σ2 = 0.018.
i.e. Population variance regarded as 0.018.
Alternative hypothesis H1: σ2 < 0.018.
i.e. Population variance is regarded as lessthan 0.018.
Step 2 : Data
Sample size (n) = 20
Sample variance (s2) = 0.024
Step 3 : Level of significance
α= 5%
Step 4 : Test statistic
Under null hypothesis H0
follows
chi-square distribution with (n–1)
degrees of freedom.
Step 5 : Calculation of test statistic
The value of chi-square under H0 is calculated
as
Step 6 : Critical value
Since H1 is a one-sided alternative, the
critical values at α =0.05 is χe2 = χ219,
0.95 = 10.117.
Step 7 : Decision
Since it is a one-tailed test, the elements of the critical region
are determined by the rejection rule χ02 < χe2
For the given sample information, the rejection rule does not
hold, since χ02 = 25.3 > χe2
= χ219, 0.95 = 10.117.
Hence, H0 is not rejected in favour of H1
. Thus, the population variance can be regarded as 0.018.
Another important application of χ2 test is the
testing of independence of attributes.
Attributes: Attributes are qualitative characteristic such as levels of
literacy, employment status, etc., which are quantified in terms
of levels/scores.
Contigency table: Independence of two attributes is an important
statistical application in which the data pertaining to the attributes
are cross classified in the form of a two – dimensional table. The levels of
one attribute are arranged in rows and of the other in columns. Such an
arrangement in the form of a table is called as a contingency table.
Computational steps for testing the independence of attributes:
Step 1 : Framing the hypotheses
Null hypothesis H0: The two attributes are independent
Alternative hypothesis H1: The two attributes are not independent.
Step 2 : Data
The data set is given in the form of a contigency as under.
Compute expected frequencies Eij corresponding to each cell
of the contingency table, using the formula
where,
N = Total sample size
Ri = Row sum corresponding to ith row
C j
= Column
sum corresponding to jth column
Step
3 :
Level of significance
Fix the desired level of significance α
Step
4 :
Calculation
Calculate the value of the test statistic as
Step
5 :
Critical value
The critical value is obtained from the table of χ2 with (m–1)(n–1) degrees of freedom
at given level of significance, α as χ2(m–1)(n–1), α.
Step
6 :
Decision
Decide on rejecting or not rejecting the null
hypothesis by comparing the calculated value of the test statistic with the
table value. If χ02
≥ χ2(m–1)(n–1),
α reject H0.
Note:
·
N, the total frequency should be
reasonably large, say greater than 50.
·
No theoretical cell-frequency should
be less than 5. If cell frequencies are less than 5, then it should be grouped
such that the total frequency is made greater than 5 with the preceding or
succeeding cell.
The following table gives the performance of 500 students
classified according to age in a computer test. Test whether the attributes age
and performance are independent at 5% of significance.
Step 1 : Null hypothesis H0: The attributes age and performance are
independent.
Alternative hypothesis H1: The attributes age and performance are not
independent.
Step 2 : Data
Compute expected frequencies Eij corresponding
to each cell of the contingency table, using the formula
where,
N = Total sample size
Ri = Row sum corresponding to ith row
Cj = Column sum corresponding to jth column
Step 3 : Level of significance
α = 5%
Step 4 : Calculation
Calculate the value of the test statistic as
This chi-square test statistic is calculated as
follows:
= 22.152 with degrees of freedom (3–1)(2–1) = 2
Step 5 : Critical value
From the chi-square table
the critical value
at 5% level
of significance is χ2 (2-1)(3-1), 0.05
= χ 22 ,
0.05 = 5.991.
Step 6 : Decision
As the calculated value χ02 = 22.152 is greater
than the critical value χ2 2005 = 5.991
the null hypothesis H0 is
rejected. Hence, the performance and age of students are not independent.
The following example will illustrate the procedure
A survey was conducted with 500 female students of which 60% were
intelligent, 40% had uneducated fathers, while 30 % of the not intelligent
female students had educated fathers. Test the hypothesis that the education of
fathers and intelligence of female students are independent.
Step 1 : Null hypothesis H0: The attributes are independent i.e.
No association between education fathers and intelligence of female
students
Alternative hypothesis H1: The attributes are not independent i.e there is association between education of
fathers and intelligence of female students
Step 2 : Data
The observed frequencies (O) has been computed from the
given information as under.
Step 3 : Level of significance
α = 5%
Step 4 : Calculation
Calculate the value of the test statistic as
where,
a= 620, b = 380, c = 550, d = 450 and N = 2000
Step 5 : Critical value
From chi-square table the critical value at 5%
level of significance is χ 21,
0.05 = 3.841
Step 6 : Decision
The calculated value χ 20 = 10.092 is greater than the critical value χ 21, 0.05 = 3.841, the null hypothesis H0 is rejected. Hence, education of fathers and intelligence of female students are not independent.
Another important application of chi-square distribution is
testing goodness of a pattern or distribution fitted to given data. This
application was regarded as one of the most important inventions in mathematical
sciences during 20th century. Goodness of fit indicates the closeness of
observed frequency with that of the expected frequency. If the curves of these
two distributions do not coincide or appear to diverge much, it is noted that
the fit is poor. If two curves do not diverge much, the fit is fair.
Step 1 : Framing of hypothesis
Null hypothesis H0 : The goodness of fit is appropriate for the
given data set
Alternative hypothesis H1 : The goodness of fit is not appropriate for the
given data set
Step 2 : Data
Calculate the expected frequencies (Ei) using
appropriate theoretical distribution such as Binomial or Poisson.
Step 3 : Select the desired level of significance α
Step 4 : Test statistic
The test statistic is
where
k = number of classes
Oi
and Ei are respectively the observed and expected frequency of
ith class such that
If any of Ei is found less than 5, the
corresponding class frequency may be pooled with preceding or succeeding
classes such that Ei's of all classes are greater than or
equal to 5. It may be noted that the value of k may be determined after
pooling the classes.
The approximate sampling distribution of the test statistic under H0
is the chi-square distribution with k-1-s d.f , s being the
number of parametres to be estimated.
Step 5 : Calculation
Calculate the value of chi-square as
The above steps in calculating the chi-square can be summarized in
the form of the table as follows:
Step 6 : Critical value
The critical value is obtained from the table of χ2
for a given level of significance α.
Step 7 : Decision
Decide on rejecting or not rejecting the null hypothesis by
comparing the calculated value of the test statistic with the table value, at
the desired level of significance.
Example 2.11
Five coins are tossed 640 times and the following results were
obtained.
Fit binomial distribution to the above data.
Solution:
Step 1 : Null hypothesis H0: Fitting of binomial distribution is
appropriate for the given data.
Alternative hypothesis H1: Fitting of binomial distribution is not
appropriate to the given data.
Step 2 : Data
Compute the expected frequencies:
n = number of coins tossed at a time = 5
Let X denote the number of heads (success) in n
tosses
N = number of times experiment is repeated = 640
To find mean of the distribution
The probability mass function of binomial
distribution is :
p(x) = nCx
px qn–x, x = 0,1,...,
n (2.1)
Mean of the binomial distribution is = np.
For x =
0, the equation (2.1) becomes
P(X = 0) = P(0) = 5c0 (0.5)5 = 0.03125
The expected frequency at x = N P(x)
The expected frequency at x =0 : N × P(0)
= 640 × 0.03125
= 20
We use recurrence formula to find the other
expected frequencies.
The expected frequency at x+1 is
×
Expected frequency at x
Table of expected frequencies:
Step 3 : Level of significance
α= 5%
Step 4 : Test statistic
Step 5 : Calculation
The test statistic is computed as under:
Step 6 : Critical value
Degrees of freedom = k – 1 – s = 6 – 1 – 1 = 4
Critical value for d.f 4 at 5% level of significance is
9.488 i.e., c42, 0.05 = 9.488
Step 7 : Decision
As the calculated χ 0 2 (=0.575) is less than the critical value χ24, 0.05 = 9.488, we do not reject the null hypothesis. Hence, the fitting of binomial distribution is appropriate.
Example 2.12
A packet consists of 100 ball pens. The distribution of the number
of defective ball pens in each packet is given below:
Examine whether Poisson distribution is appropriate for the above
data at 5% level of significance.
Solution:
Step 1 : Null hypothesis H0: Fitting of Poisson distribution is appropriate
for the given data.
Alternative hypothesis H1: Fitting of Poisson distribution is not
appropriate for the given data.
Step 2 : Data
The expected frequencies are computed as under:
To find the mean of the distribution.
Probability
mass function of Poisson distribution is:
In
the case of Poisson distribution mean (m)
= = 0.9.
At
x = 0, equation (2.2) becomes
The
expected frequency at x is N P(x)
Therefore, The expected frequency at 0 is
N × P (0)
= 100 × 0.4066
= 40.66
We use recurrence formula to find the other
expected frequencies.
The expected frequency at x+1 is
[ m / x+1 ] × Expected frequency at x
Table
of expected frequency distribution (on rounding to the nearest integer)
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
Step 5 : Calculation
The test statistic is computed as under:
Note: In the above table, we find the cell
frequencies 0,1 in the expected frequency column (E) is less than 5, Hence, we combine (pool) with either succeeding
or preceding one such that the total is made greater than 5. Here we have
pooled with preceding frequency 5 such that the total frequency is made greater
than 5. Correspondingly, cell frequencies in observed frequencies are pooled.
Step 6 : Critical value
Degrees of freedom = (k – 1 – s) = 4 – 1 – 1 = 2
Critical value for 2 d.f at 5% level of significance is
5.991 i.e., χ22, 0.05 = 5.991
Step 7 : Decision
The calculated χ02 (=51.253) is greater than the critical value (5.991) at 5% level of significance. Hence, we reject H0. i.e., fitting of Poisson distribution is not appropriate for the given data.
Example 2.13
A sample 800 students appeared for a competitive examination. It
was found that 320 students have failed, 270 have secured a third grade, 190
have secured a second grade and the remaining students qualified in first
grade. The general opinion that the above grades are in the ratio 4:3:2:1
respectively. Test the hypothesis the general opinion about the grades is
appropriate at 5% level of significance.
Step 1 : Null hypothesis H0: The result in four grades follows the ratio
4:3:2:1
Alternative hypothesis H1: The result in four grades does not follows the
ratio 4:3:2:1
Step 2 : Data
Compute expected frequencies:
Under the assumption on H0, the expected frequencies of the four grades are:
4/10 × 800 = 320 ; 3/10 × 800 = 240; 2/10 × 800 =
160; 1/10 × 800 =80
Step 3 : Test statistic
The test statistic is computed using the following table.
The test statistic is calculated as
Step 4 : Critical value
The critical value of χ2 for 3 d.f. at 5% level
of significance is 7.81 i.e., χ 23,
0.05 = 7.81
Step 5 : Decision
As the calculated value of χ 02
(=54.375) is greater than the critical value χ2 3, 0.05
= 7.81, reject H0.
Hence, the results of the four grades do not follow the ratio 4:3:2:1.
Example 2.14
The following table shows the distribution of digits in numbers
chosen at random from a telephone directory.
Test whether the occurence of the digits in the directory are
equal at 5% level of significance.
Step 1 : Null hypothesis H0: The occurrence of the digits are equal in the
directory.
Alternative hypothesis H1: The occurrence of the digits are not equal in
the directory.
Step 2 : Data
The expected frequency for each digit = 10000/10 = 1000
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic is computed using the following table.
The test statistic is calculated as
Step
4 :
Critical value
Critical value for 9 df at 5% level of significance
is 16.919 i.e., χ29, 0.05 = 16.919
Step
5 :
Decision
Since the calculated χ02 (58.542) is greater than the critical
value χ29, 0.05 = 16.919, reject H0. Hence, the digits are not uniformly distributed in
the directory.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.