Central Limit Theorem Worksheet

Central Limit Theorem Worksheet
• Page 1
1.
Which is not part of the five-number summary?
I. Q1 and Q3
II. The mean
III. The median
IV. The smallest and largest data values
 a. III only b. I only c. IV only d. II only

Solution:

Boxplots involve five specific values.

The lowest value of the data set, Q1, the median(Q2), Q3, the highest value of the data set.

These values are called a five-number summary of the data set.

So, out of 4 choices the mean is not present in the five number summary of the data set.

2.
Which of the following box plots represents the number of calculators sold during a randomly selected week?
8, 12, 23, 5, 9, 15, 3.

 a. Figure 4 b. Figure 2 c. Figure 1 d. Figure 3

Solution:

Arrange the data in order: 3, 5, 8, 9, 12, 15, 23

The median of the above data is 9.

Q1 = 5
[Q1 is the 25th percentile.]

Q3 = 15
[Q3 is the 75th percentile.]

Draw a box around Q1 and Q3, draw a vertical line through the median and connect the upper and lower values.

3.
P50 corresponds to ____.
I. Q2
II. IQR
III. MR
IV. D5
 a. II and IV only b. II and III only c. I only d. I and IV only

Solution:

Deciles are denoted by D1, D2, D3, ... , D9 and they correspond to Percentiles P10, P20, P30, ..., P90.

So, P50 corresponds to D5.

Quartiles are denoted by Q1, Q2 and Q3 and they correspond to P25, P50, P75.

So, P50 corresponds to Q2.

But IQR = Q3 - Q1, and Midrange(MR) = least value + higher value2

So, P50 does not correspond to IQR or MR.

So, P50 is same as Q2 or D5

4.
Which of the following statement(s) is/are true?
I. If a student secured 75th percentile in an exam, then it means that he scored 75 out of 100.
II. A statistic that tells the number of standard deviations a data value is above or below the mean is called $z$ score.
III. The distribution is positively skewed if the mode is to the left of the median and the mean is to the right of the median.
 a. III only b. II and III only c. II only d. I, II and III only

Solution:

If a student gets 75 correct answers out of possible 100, he obtains a percentage score of 75 and there is no indication of his position with respect to the rest of the class.

If a student's score corresponds to 75th percentile, then he/she did better than 75% of the students in his/her class.

The z score represents the number of standard deviations that a data value falls above or below the mean.

The distribution is positively skewed if the mode is to the left of the median and the mean is to the right of the median.

So, only statements II and III are true.

5.
Identify the five-number summary for the number of previous jobs held by each of six job applicants for clerk's post to BRR company if the number of jobs data is given as 2, 4, 5, 6, 8, 9.
 a. 4, 5.5, 8, 4 b. 2, 4, 5.5, 8, 9 c. 2, 4, 8, 9, 4 d. 2, 4, 5.5, 8, 9, 4

Solution:

The lowest value of the data set, Q1, the median, Q3 and the highest value of the data set are referred as five-number summary of the data set.

Lowest value of given data set is 2.

Highest value of given data set is 9.

Median = Q2 = 5+62 = 5.5

Q1 is 4
[Q1 is 25th percentile.]

Q3 is 8
[Q3 is 75th percentile.]

So, five-number summary of the given data set is 2, 4, 5.5, 8, 9.

6.
Which of the following statement(s) is/are true ?
I. Mode is the measure of central tendency used in exploratory data analysis (EDA).
II. Extremely low data value in the data set can be considered as an outlier.
III. Percentiles are the same as percentages.
 a. II only b. I, II and III only c. II and III only d. I and III only

Solution:

The measure of central tendency used in EDA is the median and not the mode.

An outlier is an extremely high or an extremely low data value when compared with the rest of the data values in the data set.

Percentile = (number of values below X)+0.5total number of values · 100%

So, percentiles are position measures to indicate the position of an individual in a group.

7.
Recognize the incorrect statement(s).
I. When all data for a variable are transformed into $z$ scores, the resulting distribution will have a mean of 1 and a standard deviation of 0.
II. The $z$ score represents the number of standard deviations that a data value falls above or below the mean.
III. Percentages are position measures used to compare the relative position of an individual in a group.
IV. Percentile divide the data set into 100 equal groups.
 a. I only b. I and III only c. II and IV only d. IV only

Solution:

When all data for a variable are transformed into z scores, the resulting distribution will have a mean of 0 and a standard deviation of 1.

The z score represents the number of standard deviations that a data value falls above or below the mean.

Percentiles are position measures used to compare the relative position of an individual in a group. (Percentile is not same as percentage.)

Percentile divide the data set into 100 equal groups.

Therefore, statements I and III are incorrect.

8.
The time (in minutes) taken by Moore and James to swim 2000 yards for 8 sessions in a pool are given.
 Moore James 34.02 36.12 34.12 35.37 35.72 35.57 34.72 35.43 34.05 36.05 34.13 34.85 36.17 34.75 36.07 34.18
Compare the distributions using box plots.
 a. Both median and variation is higher and larger for James. b. Both median and variation is higher and larger for Moore. c. Median is higher for the distribution of James & variation is larger for the distribution of Moore. d. Median is higher for the distribution of Moore & variation is larger for the distribution of James.

Solution:

Arrange the data in order for Moore: 34.02, 34.05, 34.12, 34.13, 34.72, 35.72, 36.07, 36.17

Q2 = 34.13+34.722 = 34.425

Q1 = 34.05+34.122 = 34.085

Q3 = 35.72+36.072 = 35.895

Arrange the data in order for James: 34.18, 34.75, 34.85, 35.37, 35.43, 35.57, 36.05, 36.12

Q2 = 35.37+35.432 = 35.4

Q1 = 34.75+34.852 = 34.8

Q3 = 35.57+36.052 = 35.81

Draw the box plots for each distribution on the same graph.

From the box plots, the distribution of time slot for James has higher median than for Moore. The variation or spread for the distribution of time slot for Moore is larger than the variation for James.

9.
The owner of a restaurant recorded the number of visitors for a period of one week. Examine the nature of distribution of the data, using a box plot. 256, 542, 340, 460, 380, 429, 412
 a. symmetric b. positively skewed c. negatively skewed d. approximately symmetric

Solution:

Arrange the data in order: 256, 340, 380, 412, 429, 460, 542

Q2 = 412
[Q2 is the 50th percentile.]

Q1 = 340
[Q1 is the 25th percentile.]

Q3 = 460
[Q3 is the 75th percentile.]

Draw a graph for the data and locate Q1, Q2 and Q3.

Since the correct median falls to the right of the center, the distribution is negatively skewed.

10.
The median price of different books in a bookstall is $35.The difference between twice the minimum price and half the maximum price in the bookstall gives the median price, which is also equal to the midrange. The IQR of the prices is$5. The price of the book corresponding to 25th percentile is $32. Identify five - number summary .  a. 28, 32, 35, 37, 42 b. 28, 42, 32, 35, 37 c. 28, 37, 42 d. 12, 24, 35, 40, 56 Solution: Let the minimum price of the book be x and maximum price of the book be y. 2x - y2 = 35 4x - y = 70 ... (1) [Given median = 35.] 2x - y2 = x+y2 [Given median = midrange.] 4x - y = x + y 3x - 2y = 0 ... (2) Solve (1) and (2) 8x - 2y = 140 3x - 2y = 0 ____________ 5x = 140 x = 140 / 5 = 28 If x = 28 then y = 42 [From step 2, 3x = 2y.] So, the minimum price in the book stall is 28 and maximum price in the book stall is 42. The price of the book corresponding to 25th percentile is$32.

So, Q1 = 32
[Q1 is the 25th percentile.]

Q3 - Q1 = 5
Q3 - 32 = 5, Q3 = 37
[Given IQR = 5.]

The five - number summary is 28, 32, 35, 37, 42.