General Procedure for Chi-Square Distribution and Its Applications: Properties, Procedure Steps, Example Solved Problems

**CHI-SQUARE DISTRIBUTION AND ITS APPLICATIONS**

Karl Pearson (1857-1936) was a English Mathematician and
Biostatistician. He founded the worldâ€™s first university statistics department
at University College, London in 1911. He was the first to examine whether the
observed data support a given specification, in a paper published in 1900. He
called it â€˜Chi-square goodness of fitâ€™ test which motivated research in
statistical inference and led to the development of statistics as separate
discipline.

The square of standard normal variable is known as a chi-square
variable with 1 degree of freedom (d.f.). Thus

If *X* ~ *N* (*Âµ*, Ïƒ^{2}), then it is known
that * ~ * N 0,1 . Further* Z*^{2}*
*is said to follow *Ï‡*^{2}* *â€“ distribution with 1 degree of freedom (*Ï‡*^{2}*
*â€“ pronounced as chi-square)

**Note: **i) If** ***X _{i}*

It is a continuous distribution.

Â·
The distribution has only one parameter *i.e. n d.f.*

Â·
The shape of the distribution depends upon the *d.f*, *n*.

Â·
The mean of the chi-square distribution is *n* and variance 2*n*

Â·
If *U* and *V* are independent random variables having *Ï‡*^{2}
distributions with degree of freedom *n*_{1} and *n*_{2}
respectively, then their sum *U* + *V* has the same *Ï‡*^{2}
distribution with *d.f n*_{1} + *n*_{2}.

To test the variance of the normal population,
using the statistic in note (ii)

Â·
To test the independence of attributes.

Â·
To test the goodness of fit of a distribution.

Â·
The sampling distributions of the test statistics used in the last
two applications are approximately chi-square distributions.

*Procedure*

**Step 1 : **Let** ***Âµ*** **and** ***Ïƒ*^{2}** **be
respectively the mean and the variance of the normal population under** **study,
where *Ïƒ*^{2} is known and *Âµ* unknown. If *Ïƒ*_{0}^{2}
is an admissible value of *Ïƒ*^{2}, then frame the

**Null hypothesis**** **as** ***H*_{0}:** ***Ïƒ***
**^{2}** **=** ***Ïƒ*** **_{0}^{2 }and
choose the suitable alternative hypothesis from

(i) *H*_{1} : Ïƒ^{2}
â‰ Ïƒ_{0}^{2} (ii) *H*_{1}
: Ïƒ^{2} > Ïƒ_{0}^{2} (iii) *H*_{1} : Ïƒ^{2} < Ïƒ_{0}^{2}

**Step 2 : **Describe the sample/data and its descriptive measures. Let (*X*_{1},**
***X*_{2}, â€¦,** ***X _{n}*) be a random

**Step 3 : **Fix the desired level of significance *Î±.*

**Step 4 : **Consider
the test statistic under H_{0}. The approximate sampling
distribution of the test statistic under
H_{0} is the chi-square distribution with (*n*â€“1) degrees of freedom.

**Step 5 : **Calculate the value of the of *Ï‡*^{2} for the given
sample as

**Step 6 : **Choose the critical value of** ***Ï‡*_{e}^{2}**
**corresponding to** ***Î±*** **and** ***H*_{1}** **from
the following table.

**Step 7 : **Decide on** ***H*_{0}** **choosing the suitable
rejection rule from the following table corresponding** **to *H*_{1}.

**Example 2.6**

The weights (in kg.) of 8 students of class VII are 38, 42, 43,
50, 48, 45, 52 and 50. Test the hypothesis that the variance of the population
is 48 kg, assuming the population is normal and *Âµ* is unknown.

*Solution:*

**Step 1 : ****Null Hypothesis**** ***H*_{0}:** ***Ïƒ*^{2}** **= 48 kg.

*i.e. *Population variance can be regarded as 48 kg.

**Alternative hypothesis ***H*_{1}:** ***Ïƒ*^{2}** **â‰ 48 kg.

*i.e. *Population variance cannot be regarded as 48 kg.

**Step 2 : **The given sample information is

Sample size (*n*)= 8

**Step 3 : ****Level of significance**

** Î±= **5%

**Step 4 : ****Test statistic**

Under null hypothesis *H*_{0}

follows chi-square distribution with (*n*â€“1) *d.f*.

**Step 5 : **Calculation of test statistic

The value of chi-square under *H*_{0} is calculated
as under:

To find and sample variance *s*^{2},
we form the following table.

**Step 6 : ****Critical values**

Since *H*_{1} is a two sided alternative, the
critical values at *Î±* =0.05 are *Ï‡*^{2}_{7, 0.025}*
*= 16.01 and *Ï‡*^{2}_{7,0.975}* *= 1.69.

**Step 7 : ****Decision**

Since it is a two-tailed test, the elements of the
critical region are determined by the

rejection rule 1.69 = *Ï‡*^{2}_{7, 0.975} < *Ï‡*_{0}^{2} (=3.375) < *Ï‡*^{2}_{7,0.025} = 16.01.

For the given sample information, the rejection
rule does not hold, since

1.69 = *Ï‡*^{2}_{7,
0.975} < *Ï‡*_{0}^{2}
(=3.375) < *Ï‡*^{2}_{7,0.025}
= 16.01.

Hence, *H*_{0}
is not rejected in favour of *H*_{1}.
Thus, Population variance can be regarded as 48 kg.

A normal population has mean *Âµ* (unknown) and variance 9. A
sample of size 9 observations has been taken and its variance is found to be
5.4. Test the null hypothesis *H*_{0}: *Ïƒ*^{2} = 9
against *H*_{1}:* Ïƒ*^{2}* *> 9 at 5% level of
significance.

**Step 1 : ****Null Hypothesis**** ***H*_{0}:** ***Ïƒ*^{2}** **= 9.

*i.e.*, Population variance regarded as 9.

**Alternative hypothesis ***H*_{1}:** ***Ïƒ*^{2}** **> 9.

*i.e. *Population variance is regarded as greater than 9.

**Step 2 : ****Data**

Sample size (*n*) = 9

Sample variance (s^{2}) = 5.4

**Step 3 : ****Level of significance **

Î± = 5%

**Step 4 : ****Test statistic**

Under null hypothesis *H*_{0}

follows
chi-square distribution with (*n*-1)
degrees of freedom.

**Step 5 : ****Calculation of test statistic**

The value of chi-square under *H*_{0} is calculated
as

**Step 6 : ****Critical value**

Since *H*_{1} is a one-sided alternative, the
critical values at *Î±* =0.05 is *Ï‡ _{e}*

**Step 7 : ****Decision**

Since it is a one-tailed test, the elements of the critical region
are determined by the rejection rule *Ï‡*_{0}^{2} > *Ï‡*_{e}^{2}.

For the given sample information, the rejection rule does not hold
, since *Ï‡ _{0}*

A normal population has mean *Âµ* (unknown) and variance
0.018. A random sample of size 20 observations has been taken and its variance
is found to be 0.024. Test the null hypothesis *H*_{0}:* Ïƒ*^{2}*
*= 0.018 against* H*_{1}:* Ïƒ*^{2}* *<
0.018 at 5% level of significance.

**Step 1 : ****Null Hypothesis**** ***H*_{0}:** ***Ïƒ*^{2}** **= 0.018.

*i.e. *Population variance regarded as 0.018.

**Alternative hypothesis ***H*_{1}:** ***Ïƒ*^{2}** **< 0.018.

*i.e. *Population variance is regarded as lessthan 0.018.

**Step 2 : ****Data**

Sample size (*n*) = 20

Sample variance (s^{2}) = 0.024

**Step 3 : ****Level of significance**

Î±= 5%

**Step 4 : ****Test statistic**

Under null hypothesis *H*_{0}

follows
chi-square distribution with (*n*â€“1)
degrees of freedom.

**Step 5 : **Calculation of test statistic

The value of chi-square under *H*_{0} is calculated
as

**Step 6 : ****Critical value**

Since *H*_{1} is a one-sided alternative, the
critical values at *Î±* =0.05 is *Ï‡ _{e}*

**Step 7 : ****Decision**

Since it is a one-tailed test, the elements of the critical region
are determined by the rejection rule *Ï‡*_{0}^{2} < *Ï‡*_{e}^{2}

For the given sample information, the rejection rule does not
hold, since *Ï‡ _{0}*

Hence, *H*_{0} is not rejected in favour of *H*_{1}
. Thus, the population variance can be regarded as 0.018.

Another important application of *Ï‡*^{2} test is the
testing of independence of attributes.

**Attributes: **Attributes are qualitative characteristic such as levels of
literacy, employment** **status, *etc*., which are quantified in terms
of levels/scores.

**Contigency table: **Independence of two attributes is an important
statistical application in** **which the data pertaining to the attributes
are cross classified in the form of a two â€“ dimensional table. The levels of
one attribute are arranged in rows and of the other in columns. Such an
arrangement in the form of a table is called as a contingency table.

Computational steps for testing the independence of attributes:

**Step 1 : ****Framing the hypotheses**

**Null hypothesis ***H*** _{0}**: The two attributes are independent

**Alternative hypothesis ***H _{1}*: The two attributes are not independent.

**Step 2 : ****Data**

The data set is given in the form of a contigency as under.
Compute expected frequencies *E _{ij} *corresponding to each cell
of the contingency table, using the formula

where,

*N *= Total sample size

*R _{i} *= Row sum corresponding to

*C _{j}
*= Column
sum corresponding to

**Step
3 :
Level of significance**

Fix the desired level of significance *Î±*

**Step
4 :
Calculation**

Calculate the value of the test statistic as

**Step
5 :
Critical value**

The critical value is obtained from the table of *Ï‡*^{2} with (*m*â€“1)(*n*â€“1) degrees of freedom
at given level of significance, *Î±* as *Ï‡*^{2}_{(mâ€“1)(nâ€“1),} * _{Î±}*.

**Step
6 :
Decision**

Decide on rejecting or not rejecting the null
hypothesis by comparing the calculated value of the test statistic with the
table value. If *Ï‡*_{0}^{2}
â‰¥ *Ï‡*^{2}_{(mâ€“1)(nâ€“1),}
* _{Î±}* reject

**Note:**

Â·
*N*, the total frequency should be
reasonably large, say greater than 50.

Â·
No theoretical cell-frequency should
be less than 5. If cell frequencies are less than 5, then it should be grouped
such that the total frequency is made greater than 5 with the preceding or
succeeding cell.

The following table gives the performance of 500 students
classified according to age in a computer test. Test whether the attributes age
and performance are independent at 5% of significance.

**Step 1 : ****Null hypothesis**** ***H*_{0}: The attributes age and performance are
independent.

**Alternative hypothesis ***H*_{1}: The attributes age and performance are not
independent.

**Step 2 : ****Data**

Compute expected frequencies *E _{ij}* corresponding
to each cell of the contingency table, using the formula

where,

*N *= Total sample size

*R _{i} *= Row sum corresponding to

*C _{j} *= Column sum corresponding to

**Step 3 : ****Level of significance **

*Î±*** **= 5%

**Step 4 : ****Calculation**

Calculate the value of the test statistic as

This chi-square test statistic is calculated as
follows:

= 22.152 with degrees of freedom (3â€“1)(2â€“1) = 2

**Step 5 : ****Critical value**

From the chi-square table
the critical value
at 5% level
of significance is *Ï‡*^{2} _{(2-1)(3-1), 0.05}
= *Ï‡* ^{2}_{2 ,
0.05} = 5.991.

**Step 6 : ****Decision**

As the calculated value *Ï‡*_{0}^{2} = 22.152 is greater
than the critical value *Ï‡*^{2} _{2005} = 5.991
the null hypothesis *H*_{0} is
rejected. Hence, the performance and age of students are not independent.

The following example will illustrate the procedure

A survey was conducted with 500 female students of which 60% were
intelligent, 40% had uneducated fathers, while 30 % of the not intelligent
female students had educated fathers. Test the hypothesis that the education of
fathers and intelligence of female students are independent.

**Step 1 : ****Null hypothesis**** ***H*_{0}: The attributes are independent** ***i.e.***
**No association between education** **fathers and intelligence of female
students

**Alternative hypothesis ***H*_{1}: The attributes are not independent** ***i.e*** **there is association** **between education of
fathers and intelligence of female students

**Step 2 : ****Data**

The observed frequencies (*O*) has been computed from the
given information as under.

**Step 3 : ****Level of significance **

*Î±*** **= 5%

**Step 4 : ****Calculation**

Calculate the value of the test statistic as

where,
a= 620, b = 380, c = 550, d = 450 and N = 2000

**Step 5 : ****Critical value**

From chi-square table the critical value at 5%
level of significance is Ï‡ ^{2}_{1,
0.05} = 3.841 _{}

**Step 6 : ****Decision**

The calculated value Ï‡ ^{2}_{0 } = 10.092 is greater than the critical
value Ï‡ ^{2}_{1, 0.05}
= 3.841, the null hypothesis *H*_{0}
is rejected. Hence, education of fathers and intelligence of female students
are not independent.

Another important application of chi-square distribution is
testing goodness of a pattern or distribution fitted to given data. This
application was regarded as one of the most important inventions in mathematical
sciences during 20th century. Goodness of fit indicates the closeness of
observed frequency with that of the expected frequency. If the curves of these
two distributions do not coincide or appear to diverge much, it is noted that
the fit is poor. If two curves do not diverge much, the fit is fair.

**Step 1 : ****Framing of hypothesis**

**Null hypothesis ***H*_{0}** **: The goodness of fit is appropriate for the
given data set

**Alternative hypothesis ***H*_{1}** **: The goodness of fit is not appropriate for the
given data set

**Step 2 : ****Data**

Calculate the expected frequencies (*E _{i}*) using
appropriate theoretical distribution such as Binomial or Poisson.

**Step 3 : **Select the desired level of significance** ***Î±*

**Step 4 : ****Test statistic**

The test statistic is

where
*k* = number of classes

O_{i
}and E_{i} are respectively the observed and expected frequency of
i^{th} class such that

If any of *E _{i}* is found less than 5, the
corresponding class frequency may be pooled with preceding or succeeding
classes such that

The approximate sampling distribution of the test statistic under *H _{0}*
is the chi-square distribution with

**Step 5 : ****Calculation**

Calculate the value of chi-square as

**T**he above steps in calculating the chi-square can be summarized in
the form of the table as** **follows:

**Step 6 : ****Critical value**

The critical value is obtained from the table of *Ï‡*^{2}
for a given level of significance *Î±*.

**Step 7 : ****Decision**

Decide on rejecting or not rejecting the null hypothesis by
comparing the calculated value of the test statistic with the table value, at
the desired level of significance.

**Example 2.11**

Five coins are tossed 640 times and the following results were
obtained.

Fit binomial distribution to the above data.

*Solution:*

**Step 1** **: ****Null hypothesis**** ***H*_{0}: Fitting of binomial distribution is
appropriate for the given data.

**Alternative hypothesis ***H*_{1}: Fitting of binomial distribution is not
appropriate to the** **given data.

**Step 2 : ****Data**

Compute the expected frequencies:

*n *= number of coins tossed at a time = 5

Let *X* denote the number of heads (success) in *n*
tosses

*N *= number of times experiment is repeated = 640

To find mean of the distribution

The probability mass function of binomial
distribution is :

*p*(x) = ^{n}C_{x}
*p*^{x} *q*^{nâ€“x}, *x* = 0,1,...,
n (2.1)

Mean of the binomial distribution is = *np*.

For *x* =
0, the equation (2.1) becomes

*P*(*X *= 0) =* P*(0) = 5*c*_{0}* *(0.5)^{5}* *= 0.03125

The expected frequency at *x = N P(x)*

The expected frequency at *x* =0 : *N* Ã— *P*(0)

= 640 Ã— 0.03125
= 20

We use recurrence formula to find the other
expected frequencies.

The expected frequency at *x*+1 is

Ã—
Expected frequency at *x*

**Table of expected frequencies:**

**Step 3 : ****Level of significance**

Î±= 5%

**Step 4 : ****Test statistic**

**Step 5 : ****Calculation**

The test statistic is computed as under:

**Step 6 : ****Critical value**

Degrees of freedom = *k* â€“ 1 â€“ *s* = 6 â€“ 1 â€“ 1 = 4

Critical value for *d.f* 4 at 5% level of significance is
9.488 *i.e.*, * ^{c}*4

**Step 7 : ****Decision**

As the calculated Ï‡_{ 0} ^{2} (=0.575) is less
than the critical value Ï‡^{2}_{4,
0.05} ^{ } = 9.488, we do not reject the null hypothesis.
Hence, the fitting of binomial distribution is appropriate.

**Example 2.12**

A packet consists of 100 ball pens. The distribution of the number
of defective ball pens in each packet is given below:

Examine whether Poisson distribution is appropriate for the above
data at 5% level of significance.

*Solution:*

**Step 1 : ****Null hypothesis**** ***H*_{0}: Fitting of Poisson distribution is appropriate
for the given data.

**Alternative hypothesis ***H*_{1}: Fitting of Poisson distribution is not
appropriate for the** **given data.

**Step 2 : ****Data**

The expected frequencies are computed as under:

To find the mean of the distribution.

Probability
mass function of Poisson distribution is:

In
the case of Poisson distribution mean (*m*)
= = 0.9.

At
*x* = 0, equation (2.2) becomes

The
expected frequency at *x* is N P(*x*)

Therefore, The expected frequency at 0 is

*N Ã— P *(0)

= 100 *Ã— *0.4066

= 40.66

We use recurrence formula to find the other
expected frequencies.

The expected frequency at *x*+1 is

[ *m* / *x*+1 ] Ã— Expected frequency at *x*

Table
of expected frequency distribution (on rounding to the nearest integer)

**Step 3 : ****Level of significance**

Î± = 5%

**Step 4 : ****Test statistic**

**Step 5 : ****Calculation**

The test statistic is computed as under:

**Note: **In the above table, we find the cell
frequencies 0,1 in the expected frequency** **column (*E*) is less than 5, Hence, we combine (pool) with either succeeding
or preceding one such that the total is made greater than 5. Here we have
pooled with preceding frequency 5 such that the total frequency is made greater
than 5. Correspondingly, cell frequencies in observed frequencies are pooled.

**Step 6 : ****Critical value**

Degrees of freedom = (*k* â€“ 1 â€“ *s*) = 4 â€“ 1 â€“ 1 = 2

Critical value for 2 *d.f* at 5% level of significance is
5.991 *i.e.*, *Ï‡*^{2}_{2, 0.05} = 5.991

**Step 7 : ****Decision**

The calculated *Ï‡*_{0}^{2} (=51.253) is
greater than the critical value (5.991) at 5% level of significance. Hence, we
reject *H*_{0}. i.e., fitting of Poisson distribution is not
appropriate for the given data.

**Example 2.13**

A sample 800 students appeared for a competitive examination. It
was found that 320 students have failed, 270 have secured a third grade, 190
have secured a second grade and the remaining students qualified in first
grade. The general opinion that the above grades are in the ratio 4:3:2:1
respectively. Test the hypothesis the general opinion about the grades is
appropriate at 5% level of significance.

**Step 1 : ****Null hypothesis**** ***H*_{0}: The result in four grades follows the ratio
4:3:2:1

**Alternative hypothesis ***H*_{1}: The result in four grades does not follows the
ratio 4:3:2:1

**Step 2 : ****Data**

Compute expected frequencies:

Under the assumption on *H*_{0}, the expected frequencies of the four grades are:

4/10 Ã— 800 = 320 ; 3/10 Ã— 800 = 240; 2/10 Ã— 800 =
160; 1/10 Ã— 800 =80

**Step 3 : ****Test statistic**

The test statistic is computed using the following table.

The test statistic is calculated as

**Step 4 : ****Critical value**

The critical value of *Ï‡*^{2} for 3 d.f. at 5% level
of significance is 7.81 *i.e.*, *Ï‡*_{ }^{2}_{3,
0.05} = 7.81

**Step 5 : ****Decision**

As the calculated value of *Ï‡*_{ 0}^{2}
(=54.375) is greater than the critical value *Ï‡ ^{2}*

**Example 2.14**

The following table shows the distribution of digits in numbers
chosen at random from a telephone directory.

Test whether the occurence of the digits in the directory are
equal at 5% level of significance.

**Step 1 : ****Null hypothesis**** ***H*_{0}: The occurrence of the digits are equal in the
directory.

**Alternative hypothesis ***H*_{1}: The occurrence of the digits are not equal in
the directory.

**Step 2 : ****Data**

The expected frequency for each digit = 10000/10 = 1000

**Step 3 : ****Level of significance**** **

*Î±*** **= 5%

**Step 4 : ****Test statistic**

The test statistic is computed using the following table.

The test statistic is calculated as

**Step
4 :
Critical value**

Critical value for 9 df at 5% level of significance
is 16.919 i.e., Ï‡^{2}_{9, 0.05} = 16.919^{}

**Step
5 :
Decision**

Since the calculated *Ï‡ _{0}*

Tags : Properties, Procedure Steps, Example Solved Problems | Statistics , 12th Statistics : Chapter 2 : Tests Based on Sampling Distributions I

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

12th Statistics : Chapter 2 : Tests Based on Sampling Distributions I : Chi-Square Distribution and Its Applications | Properties, Procedure Steps, Example Solved Problems | Statistics

**Related Topics **

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright Â© 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.