AP Test - Statistics

Topics Covered

The AP Statistics course imitates a one semester long non-calculus based college statistics class. The emphasis is placed on conceptual understanding and interpretation rather than complicated arithmetic computations. There are four basic themes. Exploring data analysis (describing patterns and departures from patterns) covers 20-30% of the exam. Sampling and experimentation (planning and conducting a study) covers 10-15% of the exam. Probability and random variables (producing models using probability and anticipating patterns) covers 20-30% of the exam. The last theme, statistical inference (estimating population parameters and testing hypotheses), covers 30-40% of the exam. See reference below for a description of content covered in each theme.

Section
Type
# of Qs
% of Final Grade
Time Limit
1
Multiple Choice
40
50%
90 minutes
2
Free Response
6
0.75*50%
90 minutes
Investigative Task
1*
0.25*50%
30 minutes**

*The investigative task is the 6th free-response question, not a separate entity.
**The 30 minutes is a portion of the total time given for Section 2.

The AP Statistics exam is 3 hours long and contains 2 sections – multiple choice questions and free response questions. Each section of the exam is worth 50% of the final exam grade. Section 1 is 90 minutes long and contains 40 multiple choice questions. Section 2 is 90 minutes long as well and contains 6 free response questions, one of which is the investigative task. The first five questions are worth 75% of the grade for section 2. The sixth question is worth 25% of the grade for section 2, so students should allocate more time to it.

Graphing calculators are allowed. The calculator memory will not be cleared, although it can only contain programs, not notes. A formula sheet is provided, seen below.

For more information, refer to the long official handout: http://apcentral.collegeboard.com/apc/public/repository/ap-statistics-course-description.pdf

In section 1, one point is added for a correct answer. One quarter of a point is deducted for an incorrect answer. The student’s raw score is multiplied by 1.25 to get a maximum of 50 points. In section 2, each of the 6 free response question is scored on a scale from 0 to 4. The questions are scored holistically, so a student’s answer does not have to be perfect. The raw score for questions 1-5 is multiplied by 1.875 and the raw score for question 6 is multiplied by 3.125. Sum everything together to get the composite score.

Composite Scoring Range
AP Grade
60-100
5
45-59
4
32-44
3
23-31
2
0-22
1

The grade distribution has not changed in recent years. This table shows the percentage of students who received a 1, 2, 3, 4, or 5 over the past 4 years.

Year
2010
2011
2012
2013
5
0.128
0.121
0.126
0.122
4
0.224
0.213
0.202
0.209
3
0.235
0.251
0.25
0.257
2
0.182
0.177
0.188
0.181
1
0.231
0.238
0.234
0.231
Mean Grade
2.84
2.8
2.81
2
  1. Youtube Channel ProfRobBob is very helpful in discussing the topics taught in AP Statistics. Link to the playlist: http://www.youtube.com/watch?v=CUuWMwJ1Juw&list=PLC8478000586FA6F9
  2. Learning through hands-on projects is extremely helpful.
  3. The questions are not presented in any order of difficulty, so use your personal preference to differentiate the easy questions and do them first.
  4. You have about two minutes for each multiple-choice question, 12-13 minutes for Questions 1-5 of the free response section, and 25-30 minutes for the investigative task. Avoid spending too much time on any one question.
  5. Note that the investigative task (Question #6) may contain material you’ve never studied, since the goal of such a question is to see how well you reason statistically.
  6. Mark down the date of the exam (May 9, 2014).

Link: http://www.education.com/study-help/article/tips-exam1/?page=3

  1. When asked to describe a one-variable data set, always discuss shape, center, and spread.
  2. Understand how skewness can be used to differentiate between the mean and the median.
  3. Know how transformations of a data set affect summary statistics (mean, median, mode, interquartile range, standard deviation, skewness, etc).
  4. “Normal” refers to a specific distribution. Instead of writing “normal,” use “approximately normal” and “bell-shaped” instead if you were not given the specific distribution.
  5. Correlation is not causation. A lack of correlation does not mean that there is no relationship (it might be linear).
  6. Use residual plot to determine if a linear model is appropriate.
  7. Interpret the slope and y-intercept of a least-squares regression line
  8. Read computer regression output.
  9. Know the definition of a simple random sample (SRS).
  10. An experiment that uses blocking cannot be a completely randomized design.
  11. Know the differences between why one uses randomization vs blocking.
  12. Know what blinding and confounding variables are.
  13. Know how to create a simulation for a probability problem.
  14. Know to differentiate between independent events and mutually exclusive events. Know why mutually exclusive events can’t be independent (look at the definitions).
  15. Find the mean and standard deviation of a discrete random variable.
  16. Recognize binomial and geometric situations.
  17. Hypotheses are about parameters, never about statistics.
  18. Know the four steps of any inference procedure.
  19. In inference problems, show, not declare, that the conditions necessary to do the procedure are present.
  20. Know Type I and Type II errors and the power of a test.
  21. For confidence interval questions, you need three things: justify that the conditions necessary to construct the interval are present, construct the interval, and interpret the interval in context.
  22. Label your graphs.

SECTION

TOPIC AREA/
PERCENT OF
EXAM

TOPICS

I.

Exploring Data
(20%-30%)

A. Graphics display of distributions of one- variable data (dot plot, stem plot, histogram, ogive).
B. Summarizing distributions of one-variable data (center, spread, position, box plots, changing units).
C. Comparing distributions of one-variable data.
D. Exploring two-variable data (scatter plots, Linearity, regression, residuals, transformations).
E. Exploring categorical data (tables, bar charts, marginal and joint frequencies, conditional relative frequencies).

II.

Sampling and
Experimentation
(10%-15%)

A. Methods of data collection (census, survey, Experiment, observational study).
B. Planning and conducting surveys (populations and samples, randomness, sources of bias, sampling methods—esp. SRS).
C. Experiments (treatments and control groups, random assignment, replication, sources of bias, confounding, placebo effect, blinding, randomized design, block design).
D. Generalizabilty of results.

III.

Anticipating Patterns
(Probability and
Random Variables)
(20%-30%)

A. Probability (relative frequency, law of large numbers, addition and multiplication rules, conditional probability, independence, random variables, simulation, mean and standard deviation of a random variable).
B. Combining independent random variables (means and standard deviations).
C. The normal distribution.
D. Sampling distributions (mean, proportion, differences between two means, difference between two proportions, central limit theorem, simulation, t-distribution, chi-square distribution).

IV.

Statistical Inference
(30%-40%)

A. Estimation (population parameters, margin of error, point estimators, confidence interval for a proportion, confidence interval for the difference between two proportions, confidence interval for a mean, confidence interval for the difference between two means, confidence interval for the slope of a least-squares regression line).
B. Test of significance (logic of hypothesis testing, Type I and Type II errors, power of a test, inference for means and proportions, chi-square test, test for the slope of a least-squares line).

  1. In the scatterplot of y versus x shown above, the least squares regression line is superimposed on the plot. Which of the following points has the largest residual?
    1. A
    2. B
    3. C
    4. D
    5. E
  2. candy company claims that 10 percent of its candies are blue. A random sample of 200 of these candies is taken, and 16 are found to be blue. Which of the following tests would be most appropriate for establishing whether the candy company needs to change its claim?
    1. Matched pairs t-test
    2. One Sample proportion z-test
    3. Two-sample t-test
    4. Two-sample Proportion z-test
    5. Chi-square test of association
  3. In a test of H0: µ=8, a sample of size 220 leads to a p-value of 0.034. Which of the following must be true?
    1. A 95% confidence interval for µ calculated from these data will not include µ=8
    2. At the 5% level if H0 is rejected, the probability of a Type II error is 0.034
    3. The 95% confidence interval for µ calculated from these data will be centered at µ=8
    4. The null hypothesis should be rejected at the 5% level
    5. The sample size is insufficient to draw a conclusion with 95% confidence interval
  4. Courtney has constructed a cricket out of paper and rubber bands. According to the insturctions for making the cricket, when it jumps it will land on its feet half of the time and on its back the other half of the time. In the first 50 jumps, Courtney’s cricket landed on its feet 35 times. In the next 10 jumps, it landed on its feet only twice. Based on this experience, Courtney can conclude that
    1. The cricket was due to land on its feet less than half the time during the final 10 jumps, since it had landed too often on its feet during the first 50 jumps.
    2. A confidence interval for estimating the cricket’s true probability of landing on its feet is wider after the final 10 jumps than it was before the final 10 jumps
    3. A confidence interval for estimating the cricket’s true probability of landing on its feet after the final 10 jumps is exactly the same as it was before the final 10 jumps
    4. A confidence interval for estimating the cricket’s true probability of landing on its feet is more narrow after the final 10 jumps than it was before the final 10 jumps
    5. A confidence interval for estimating the cricket’s true probability of landing on its feet based on the initial 50 jumps does not include 0.2, so there must be a defect in the cricket’s construction to account for the poor showing in the final 10 jumps.
  1. Link to Free Response Questions Tests 1998-2013: http://apcentral.collegeboard.com/apc/members/exam/exam_information/8357.html
    Each full carton of Grade A eggs consists of 1 randomly selected empty cardboard container and 12 randomly selected eggs. The weights of such full cartons are approximately normally distributed with a mean of 840 grams and a standard deviation of 7.9 grams.

    1. What is the probability that a randomly selected full carton of Grade A eggs will weigh more than 850 grams?
    1. The weights of the empty cardboard containers have a mean of 20 grams and a standard deviation of
1.7 grams. It is reasonable to assume independence between the weights of the empty cardboard containers and the weights of the eggs. It is also reasonable to assume independence among the weights of the 12 eggs that are randomly selected for a full carton .

      Let the random variable X be the weight of a single randomly selected Grade A egg.

      1. What is the mean of X?
      1. What is the standard deviation of X?
  1. Tropical storms in the Pacific Ocean with sustained winds that exceed 74 miles per hour are called typhoons. Graph A below displays the number of recorded typhoons in two regions of the Pacific Ocean—the Eastern Pacific and the Western Pacific—for the years from 1997 to 2010.



    1. Compare the distributions of yearly frequencies of typhoons for the two regions of the Pacific Ocean for the years from 1997 to 2010
    1. For each region, describe how the yearly frequencies changed over the time period from 1997 to 2010.
    1. A moving average for data collected at regular time increments is the average of data values for two or more consecutive increments. The 4-year moving averages for the typhoon data are provided in the table below. For example, the Eastern Pacific 4-year moving average for 2000 is the average of 22, 16, 15, and 21, which is equal to 18.50. Show how to calculate the 4-year moving average for the year 2010 in the Western Pacific. Write your value in the appropriate place in the table.

      Year
      Number of
      Typhoons in the
      Eastern Pacific
      Eastern Pacifi
      4-year moving average
      Number of
      Typhoons in the
      Western Pacific
      Western Pacific
      4-year moving
      average
      1997
      22
      33
      1998
      16
      27
      1999
      15
      36
      2000
      21
      18.50
      37
      33.25
      2001
      19
      17.75
      37
      34.25
      2002
      19
      18.50
      39
      37.25
      2003
      17
      19.00
      30
      35.75
      2004
      17
      18.00
      34
      35.00
      2005
      17
      17.50
      26
      32.25
      2006
      25
      19.00
      34
      31.00
      2007
      19
      19.50
      28
      30.50
      2008
      20
      20.25
      27
      28.75
      2009
      23
      21.75
      28
      29.25
      2010
      18
      20.00
      18

    2. Graph B below shows both yearly frequencies (connected by dashed lines) and the respective 4-year moving averages (connected by solid lines). Use your answer in part c to complete the graph.


    3. Consider graph B
      1. What information is more apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons?
      1. hat information is less apparent from the plots of the 4-year moving averages than from the plots of the yearly frequencies of typhoons?
Sample Questions (Multiple Choice)

1. a. A
2. b. One Sample proportion z-test
3. a. A 95% confidence interval for µ calculated from these data will not include µ=8
4. d. A confidence interval for estimating the cricket’s true probability of landing on its feet is more narrow after the final 10 jumps than it was before the final 10 jumps

Sample Questions (Free Response)

a
Let W denote the weight of a randomly selected full carton of eggs. W~N(840, 7.92).
The z-score for a weight of 850 grams is z = (850-840)/(7.9) = 1.27
The z-table shows that P[W>850] = P[Z>1.27] = 1-P[Z<1.27] = 1-0.8980 = 0.1020

b
Let W represent the weight of a randomly selected full carton of eggs, P the weight of the packaging, and Xi the weight of the ith egg, for I = 1, 2,…, 12.
Note that W = P + X1 + X2 +…+ X12
E(W) = E(P) + E(X1) +…+ E(X12) by the linearity property of expectations.
Since X1 = X2 =…= X12, E(W) = E(P) + 12*E(Xi)
Given that E(W)=840 and E(P)=20, so 840 = 20 + 12*E(Xi) → E(Xi) = 68.33

c
Because of independence, Var(W) = Var(P) + Var(X1) + Var(X2) +…+ Var(X12).
Since Var(X1) =…= Var(X12), given SD(W) = 7.9 and SD(P) = 1.7 →Var(W) = 7.92 and Var(P) =1.72.
→7.92 = 1.72 +12*Var(Xi) →Var(Xi) = 4.96 → SD(Xi) = √(4.96) = 2.23

Sample Question (Investigative Task)

1a
The Western Pacific Ocean had more typhoons than the Eastern Pacific Ocean in all but one of these years. The average seems to have been about 31 typhoons per year in the Western Pacific Ocean, which is higher than the average of about 19 typhoons per year in the Eastern Pacific Ocean. The Western Pacific Ocean also saw more variability (in number of typhoons per year) than the Eastern Pacific Ocean; for example, the range of the frequencies for the Western Pacific is about 21 typhoons and only 10 typhoons for the Eastern Pacific.

1b
The Western Pacific Ocean had a decreasing trend in number of typhoons per year over this time period, especially from about 2001 through 2010. In contrast, the Eastern Pacific Ocean was fairly consistent in the number of typhoons per year over this time period, with a slight increasing trend in the later years from 2005 through 2010.

1c
The four year moving average for the year 2010 in the Western Pacific Ocean is: (28+27+28+18)/4 = 25.25.
The values written in the table is as follows:

2008 20 20.25 27 28.75
2009 23 21.75 28 29.25
2010 18 20 18 25.25

1d

1e.i
The overall trends across this time period were more apparent with the moving averages than with the original frequencies. The moving averages reduce variability, making more apparent the overall decreasing trend in number of typhoons in the Western Pacific Ocean and the slight increasing trend in the number of typhoons in the Eastern Pacific Ocean.

1e.ii
The year-to-year variability in number of typhoons is less apparent with the moving averages than with the original frequencies.