Home | | Statistics 12th Std | Chi-Square Distribution and Its Applications

Properties, Procedure Steps, Example Solved Problems | Statistics - Chi-Square Distribution and Its Applications | 12th Statistics : Chapter 2 : Tests Based on Sampling Distributions I

Chapter: 12th Statistics : Chapter 2 : Tests Based on Sampling Distributions I

Chi-Square Distribution and Its Applications

General Procedure for Chi-Square Distribution and Its Applications: Properties, Procedure Steps, Example Solved Problems

CHI-SQUARE DISTRIBUTION AND ITS APPLICATIONS

Karl Pearson (1857-1936) was a English Mathematician and Biostatistician. He founded the world’s first university statistics department at University College, London in 1911. He was the first to examine whether the observed data support a given specification, in a paper published in 1900. He called it ‘Chi-square goodness of fit’ test which motivated research in statistical inference and led to the development of statistics as separate discipline.

 

1. Chi-Square distribution

The square of standard normal variable is known as a chi-square variable with 1 degree of freedom (d.f.). Thus

If X ~ N (µ, σ2), then it is known that  ~  N 0,1 . Further Z2 is said to follow χ2 – distribution with 1 degree of freedom (χ2 – pronounced as chi-square)

Note: i) If Xi ~ N (µ, σ2) , i = 1, 2, …, n are n iid random variables, then


 

2. Properties of c2 distribution

It is a continuous distribution.

·              The distribution has only one parameter i.e. n d.f.

·              The shape of the distribution depends upon the d.f, n.

·              The mean of the chi-square distribution is n and variance 2n

·              If U and V are independent random variables having χ2 distributions with degree of freedom n1 and n2 respectively, then their sum U + V has the same χ2 distribution with d.f n1 + n2.

 

3. Applications of chi-square distribution

To test the variance of the normal population, using the statistic in note (ii)

·              To test the independence of attributes.

·              To test the goodness of fit of a distribution.

·              The sampling distributions of the test statistics used in the last two applications are approximately chi-square distributions.

 

4. Test of Hypotheses for population variance of the normal population (Population mean is assumed to be unknown)

Procedure

Step 1 : Let µ and σ2 be respectively the mean and the variance of the normal population under study, where σ2 is known and µ unknown. If σ02 is an admissible value of σ2, then frame the

Null hypothesis as H0: σ 2 = σ 02 and choose the suitable alternative hypothesis from 

(i) H1 : σ2 ≠ σ02 (ii) H1 : σ2 > σ02 (iii) H1 : σ2 < σ02

Step 2 : Describe the sample/data and its descriptive measures. Let (X1, X2, …, Xn) be a random sample of n observations drawn from the population, where n is small (n < 30).

Step 3 : Fix the desired level of significance α.

Step 4 : Consider the test statistic  under H0. The approximate sampling distribution of  the test statistic under H0 is the chi-square distribution with (n–1) degrees of freedom.

Step 5 : Calculate the value of the of χ2 for the given sample as 

Step 6 : Choose the critical value of χe2 corresponding to α and H1 from the following table.


Step 7 : Decide on H0 choosing the suitable rejection rule from the following table corresponding to H1


 

Example 2.6

The weights (in kg.) of 8 students of class VII are 38, 42, 43, 50, 48, 45, 52 and 50. Test the hypothesis that the variance of the population is 48 kg, assuming the population is normal and µ is unknown.

Solution:

Step 1 : Null Hypothesis H0: σ2 = 48 kg.

i.e. Population variance can be regarded as 48 kg.

Alternative hypothesis H1: σ2 ≠ 48 kg.

i.e. Population variance cannot be regarded as 48 kg.

Step 2 : The given sample information is

Sample size (n)= 8

Step 3 : Level of significance

 Î±= 5%

Step 4 : Test statistic

Under null hypothesis H0


follows chi-square distribution with (n–1) d.f.

Step 5 : Calculation of test statistic

The value of chi-square under H0 is calculated as under:

To find  and sample variance s2, we form the following table.



Step 6 : Critical values

Since H1 is a two sided alternative, the critical values at α =0.05 are χ27, 0.025 = 16.01 and χ27,0.975 = 1.69.

Step 7 : Decision

Since it is a two-tailed test, the elements of the critical region are determined by the

rejection rule 1.69 = χ27, 0.975 < χ02 (=3.375) < χ27,0.025 = 16.01.

For the given sample information, the rejection rule does not hold, since

1.69 = χ27, 0.975 < χ02 (=3.375) < χ27,0.025 = 16.01.

Hence, H0 is not rejected in favour of H1. Thus, Population variance can be regarded as 48 kg.

 

Example 2.7

A normal population has mean µ (unknown) and variance 9. A sample of size 9 observations has been taken and its variance is found to be 5.4. Test the null hypothesis H0: σ2 = 9 against H1: σ2 > 9 at 5% level of significance.

Solution:

Step 1 : Null Hypothesis H0: σ2 = 9.

i.e., Population variance regarded as 9.

Alternative hypothesis H1: σ2 > 9.

i.e. Population variance is regarded as greater than 9.

Step 2 : Data

Sample size (n) = 9

Sample variance (s2) = 5.4

Step 3 : Level of significance

α = 5%

Step 4 : Test statistic

Under null hypothesis H0


follows chi-square distribution with (n-1) degrees of freedom.

Step 5 : Calculation of test statistic

The value of chi-square under H0 is calculated as


Step 6 : Critical value

Since H1 is a one-sided alternative, the critical values at α =0.05 is χe2 = χ28, 0.05 = 15.507.

Step 7 : Decision

Since it is a one-tailed test, the elements of the critical region are determined by the rejection rule χ02 > χe2.

For the given sample information, the rejection rule does not hold , since χ02 = 4.8 < χ28, 0.05 = 15.507. Hence, H0 is not rejected in favour of H1. Thus, the population variance can be regarded as 9.

 

Example 2.8

A normal population has mean µ (unknown) and variance 0.018. A random sample of size 20 observations has been taken and its variance is found to be 0.024. Test the null hypothesis H0: σ2 = 0.018 against H1: σ2 < 0.018 at 5% level of significance.

Solution:

Step 1 : Null Hypothesis H0: σ2 = 0.018.

i.e. Population variance regarded as 0.018.

Alternative hypothesis H1: σ2 < 0.018.

i.e. Population variance is regarded as lessthan 0.018.

Step 2 : Data

Sample size (n) = 20

Sample variance (s2) = 0.024

Step 3 : Level of significance

α= 5%

Step 4 : Test statistic

Under null hypothesis H0


follows chi-square distribution with (n–1) degrees of freedom.

Step 5 : Calculation of test statistic

The value of chi-square under H0 is calculated as


Step 6 : Critical value

Since H1 is a one-sided alternative, the critical values at α =0.05 is χe2 = χ219, 0.95 = 10.117.

Step 7 : Decision

Since it is a one-tailed test, the elements of the critical region are determined by the rejection rule χ02 < χe2

For the given sample information, the rejection rule does not hold, since χ02 = 25.3 > χe2 = χ219, 0.95 = 10.117.

Hence, H0 is not rejected in favour of H1 . Thus, the population variance can be regarded as 0.018.

 

5. Test of Hypotheses for independence of Attributes

Another important application of χ2 test is the testing of independence of attributes.

Attributes: Attributes are qualitative characteristic such as levels of literacy, employment status, etc., which are quantified in terms of levels/scores.

Contigency table: Independence of two attributes is an important statistical application in which the data pertaining to the attributes are cross classified in the form of a two – dimensional table. The levels of one attribute are arranged in rows and of the other in columns. Such an arrangement in the form of a table is called as a contingency table.

Computational steps for testing the independence of attributes:

Step 1 : Framing the hypotheses

Null hypothesis H0: The two attributes are independent

Alternative hypothesis H1: The two attributes are not independent.

Step 2 : Data

The data set is given in the form of a contigency as under. Compute expected frequencies Eij corresponding to each cell of the contingency table, using the formula


where,

N = Total sample size

Ri = Row sum corresponding to ith row

C j = Column sum corresponding to jth column

Step 3  :  Level of significance

Fix the desired level of significance α

Step 4  :  Calculation

Calculate the value of the test statistic as


Step 5  :  Critical value

The critical value is obtained from the table of χ2 with (m–1)(n–1) degrees of freedom at given level of significance, α as χ2(m–1)(n–1), α.

Step 6  :  Decision

Decide on rejecting or not rejecting the null hypothesis by comparing the calculated value of the test statistic with the table value. If χ02 ≥ χ2(m–1)(n–1), α reject H0.

Note:

·              N, the total frequency should be reasonably large, say greater than 50.

·              No theoretical cell-frequency should be less than 5. If cell frequencies are less than 5, then it should be grouped such that the total frequency is made greater than 5 with the preceding or succeeding cell.

 

Example 2.9

The following table gives the performance of 500 students classified according to age in a computer test. Test whether the attributes age and performance are independent at 5% of significance.


Solution:

Step 1 : Null hypothesis H0: The attributes age and performance are independent.

Alternative hypothesis H1: The attributes age and performance are not independent.

Step 2 : Data

Compute expected frequencies Eij corresponding to each cell of the contingency table, using the formula


where,

N = Total sample size

Ri = Row sum corresponding to ith row

Cj = Column sum corresponding to jth column


Step 3 : Level of significance

α = 5%

Step 4 : Calculation

Calculate the value of the test statistic as


This chi-square test statistic is calculated as follows:


= 22.152 with degrees of freedom (3–1)(2–1) = 2

Step 5 : Critical value

From  the chi-square  table  the  critical  value  at  5%  level  of  significance  is χ2 (2-1)(3-1), 0.05 = χ 22 , 0.05 = 5.991.

Step 6 : Decision

As the calculated value χ02 = 22.152 is greater than the critical value χ2 2005 = 5.991 the null hypothesis H0 is rejected. Hence, the performance and age of students are not independent.

The following example will illustrate the procedure

 

Example 2.10

A survey was conducted with 500 female students of which 60% were intelligent, 40% had uneducated fathers, while 30 % of the not intelligent female students had educated fathers. Test the hypothesis that the education of fathers and intelligence of female students are independent.

Solution:

Step 1 : Null hypothesis H0: The attributes are independent i.e. No association between education fathers and intelligence of female students

Alternative hypothesis H1: The attributes are not independent i.e there is association between education of fathers and intelligence of female students

Step 2 : Data

The observed frequencies (O) has been computed from the given information as under.


Step 3 : Level of significance

α = 5%

Step 4 : Calculation

Calculate the value of the test statistic as


where, a= 620, b = 380, c = 550, d = 450 and N = 2000


Step 5 : Critical value

From chi-square table the critical value at 5% level of significance is  χ 21, 0.05 = 3.841

Step 6 : Decision

The calculated value  χ 20  = 10.092 is greater than the critical value   χ 21, 0.05 = 3.841, the null hypothesis H0 is rejected. Hence, education of fathers and intelligence of female students are not independent.

 

6. Tests for Goodness of Fit

Another important application of chi-square distribution is testing goodness of a pattern or distribution fitted to given data. This application was regarded as one of the most important inventions in mathematical sciences during 20th century. Goodness of fit indicates the closeness of observed frequency with that of the expected frequency. If the curves of these two distributions do not coincide or appear to diverge much, it is noted that the fit is poor. If two curves do not diverge much, the fit is fair.

 

Computational steps for testing the significance of goodness of fit:

Step 1 : Framing of hypothesis

Null hypothesis H0 : The goodness of fit is appropriate for the given data set

Alternative hypothesis H1 : The goodness of fit is not appropriate for the given data set

Step 2 : Data

Calculate the expected frequencies (Ei) using appropriate theoretical distribution such as Binomial or Poisson.

Step 3 : Select the desired level of significance α

Step 4 : Test statistic

The test statistic is


where k = number of classes

Oi and Ei are respectively the observed and expected frequency of ith class such that


If any of Ei is found less than 5, the corresponding class frequency may be pooled with preceding or succeeding classes such that Ei's of all classes are greater than or equal to 5. It may be noted that the value of k may be determined after pooling the classes.

The approximate sampling distribution of the test statistic under H0 is the chi-square distribution with k-1-s d.f , s being the number of parametres to be estimated.

Step 5 : Calculation

Calculate the value of chi-square as 


The above steps in calculating the chi-square can be summarized in the form of the table as follows:

Step 6 : Critical value

The critical value is obtained from the table of χ2 for a given level of significance α.

Step 7 : Decision

Decide on rejecting or not rejecting the null hypothesis by comparing the calculated value of the test statistic with the table value, at the desired level of significance.

 

Example 2.11

Five coins are tossed 640 times and the following results were obtained.


Fit binomial distribution to the above data.

Solution:

Step 1 : Null hypothesis H0: Fitting of binomial distribution is appropriate for the given data.

Alternative hypothesis H1: Fitting of binomial distribution is not appropriate to the given data.

Step 2 : Data

Compute the expected frequencies:

n = number of coins tossed at a time = 5

Let X denote the number of heads (success) in n tosses

N = number of times experiment is repeated = 640

To find mean of the distribution


The probability mass function of binomial distribution is :

p(x) = nCx px qn–x, x = 0,1,..., n                            (2.1)

Mean of the binomial distribution is  = np.


For x = 0, the equation (2.1) becomes

P(X = 0) = P(0) = 5c0 (0.5)5 = 0.03125

The expected frequency at x = N P(x)

The expected frequency at x =0 : N × P(0)

= 640 × 0.03125 = 20

We use recurrence formula to find the other expected frequencies.

The expected frequency at x+1 is

 Ã— Expected frequency at x


Table of expected frequencies:


Step 3 : Level of significance

 Î±= 5%

Step 4 : Test statistic


Step 5 : Calculation

The test statistic is computed as under:



Step 6 : Critical value

Degrees of freedom = k – 1 – s = 6 – 1 – 1 = 4

Critical value for d.f 4 at 5% level of significance is 9.488 i.e., c42, 0.05 = 9.488

Step 7 : Decision

As the calculated χ 0 2 (=0.575) is less than the critical value  Ï‡24, 0.05   = 9.488, we do not reject the null hypothesis. Hence, the fitting of binomial distribution is appropriate.

 

Example 2.12

A packet consists of 100 ball pens. The distribution of the number of defective ball pens in each packet is given below:


Examine whether Poisson distribution is appropriate for the above data at 5% level of significance.

Solution:

Step 1 : Null hypothesis H0: Fitting of Poisson distribution is appropriate for the given data.

Alternative hypothesis H1: Fitting of Poisson distribution is not appropriate for the given data.

Step 2 : Data

The expected frequencies are computed as under:

To find the mean of the distribution.



Probability mass function of Poisson distribution is:


In the case of Poisson distribution mean (m) =  = 0.9.

At x = 0, equation (2.2) becomes


The expected frequency at x is N P(x)

Therefore, The expected frequency at 0 is

N × P (0)

= 100 × 0.4066

= 40.66

We use recurrence formula to find the other expected frequencies.

The expected frequency at x+1 is

[ m  / x+1 ] × Expected frequency at x


Table of expected frequency distribution (on rounding to the nearest integer)


Step 3 : Level of significance

α = 5%

Step 4 : Test statistic


Step 5 : Calculation

The test statistic is computed as under:


Note: In the above table, we find the cell frequencies 0,1 in the expected frequency column (E) is less than 5, Hence, we combine (pool) with either succeeding or preceding one such that the total is made greater than 5. Here we have pooled with preceding frequency 5 such that the total frequency is made greater than 5. Correspondingly, cell frequencies in observed frequencies are pooled.


Step 6 : Critical value

Degrees of freedom = (k – 1 – s) = 4 – 1 – 1 = 2

Critical value for 2 d.f at 5% level of significance is 5.991 i.e., χ22, 0.05 = 5.991

Step 7 : Decision

The calculated χ02 (=51.253) is greater than the critical value (5.991) at 5% level of significance. Hence, we reject H0. i.e., fitting of Poisson distribution is not appropriate for the given data.

 

Example 2.13

A sample 800 students appeared for a competitive examination. It was found that 320 students have failed, 270 have secured a third grade, 190 have secured a second grade and the remaining students qualified in first grade. The general opinion that the above grades are in the ratio 4:3:2:1 respectively. Test the hypothesis the general opinion about the grades is appropriate at 5% level of significance.

Step 1 : Null hypothesis H0: The result in four grades follows the ratio 4:3:2:1

Alternative hypothesis H1: The result in four grades does not follows the ratio 4:3:2:1

Step 2 : Data

Compute expected frequencies:

Under the assumption on H0, the expected frequencies of the four grades are:

4/10 × 800 = 320 ; 3/10 × 800 = 240; 2/10 × 800 = 160; 1/10 × 800 =80

Step 3 : Test statistic

The test statistic is computed using the following table.


The test statistic is calculated as


Step 4 : Critical value

The critical value of χ2 for 3 d.f. at 5% level of significance is 7.81 i.e., χ 23, 0.05 = 7.81

Step 5 : Decision

As the calculated value of χ 02 (=54.375) is greater than the critical value χ2 3, 0.05 = 7.81, reject H0. Hence, the results of the four grades do not follow the ratio 4:3:2:1.

 

Example 2.14

The following table shows the distribution of digits in numbers chosen at random from a telephone directory.


Test whether the occurence of the digits in the directory are equal at 5% level of significance.

Step 1 : Null hypothesis H0: The occurrence of the digits are equal in the directory.

Alternative hypothesis H1: The occurrence of the digits are not equal in the directory.

Step 2 : Data

The expected frequency for each digit = 10000/10 = 1000

Step 3 : Level of significance

α = 5%

Step 4 : Test statistic

The test statistic is computed using the following table.


The test statistic is calculated as


Step 4  :  Critical value

Critical value for 9 df at 5% level of significance is 16.919 i.e., χ29, 0.05 = 16.919

Step 5  :  Decision

Since the calculated χ02 (58.542) is greater than the critical value χ29, 0.05 = 16.919, reject H0. Hence, the digits are not uniformly distributed in the directory.

 

Tags : Properties, Procedure Steps, Example Solved Problems | Statistics , 12th Statistics : Chapter 2 : Tests Based on Sampling Distributions I
Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
12th Statistics : Chapter 2 : Tests Based on Sampling Distributions I : Chi-Square Distribution and Its Applications | Properties, Procedure Steps, Example Solved Problems | Statistics


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.