# Quarter Wit, Quarter Wisdom: When a Little Information is Enough to Solve a GMAT Problem

We have reviewed what standard deviation is in a past post. We know what data is necessary to calculate the standard deviation of a set, but in some cases, we could actually do with a lot less information than the average test-taker may think they need.

Let’s explore this idea through an example GMAT data sufficiency question:

What is the standard deviation of a set of numbers whose mean is 20?

Statement 1: The absolute value of the difference of each number in the set from the mean is equal.
Statement 2: The sum of the squares of the differences from the mean is greater than 100.

We need to determine whether the information we have been given is sufficient to get us the exact value of the standard deviation of a particular set of numbers. To find the standard deviation of a set, we need to know the deviation of each term from the mean so that we can square those deviations, sum the squares, divide them by the number of terms, and then find the square root.

Essentially, to find the standard deviation we either need to know each element of the set, or we need to know the deviation of each element from the mean (which will also give us the number of terms), or we need to know the sum of the square of deviations and the number of terms in the set.

The question stem here tells us that the mean of the set is 20. We have no other information about any of the actual elements of the set or the number of elements. With this in mind, let’s examine each of the statements:

Statement 1: The absolute value of the difference of each number in the set from the mean is equal.

With this statement, we don’t actually know what the absolute value of the difference is. We also don’t know how many elements there are. The set could be something like:

19, 21 (each term is exactly 1 away from the mean 20)
or
18, 18, 22, 22 (each term is exactly 2 away from the mean 20)
etc.

The standard deviation in each case will be different. We don’t know the elements of the set and we don’t know the number of elements in the set. Because of this, there is no way for us to know the value of the standard deviation – this statement alone is not sufficient.

Statement 2: The sum of the squares of the differences from the mean is greater than 100.

“Greater than 100” encompasses a large range of numbers – it could be any value larger than 100. Again, we cannot find the exact standard deviation of the set, so this statement is also not sufficient alone.

Using both statements together, we still do not have any idea of what the elements of the set are or what the sum of the squares of the differences from the mean is. We also still don’t know the number of elements. Hence, both statements together are not sufficient, so the answer is E.

Now, let us add just one more piece of information to the problem in this similar question:

What is the standard deviation of a set of 7 numbers whose mean is 20?

Statement 1: The absolute value of the difference of each number in the set from the mean is equal.
Statement 2: The sum of the squares of the differences from the mean is greater than 100.

What would you expect the answer to be? Still E, right? The sum of the deviations are still unknown and the exact elements of the set are still unknown – all we know is the number of elements. Actually, this information is already too much. All we need to know is that the number of elements is odd and suddenly we can find the standard deviation.

Here is why:

Statement 1 is quite tricky.

If we have an odd number of elements, in which case can the absolute values of the differences of each number in the set from the mean be equal?

Think about it – the mean of the set is 20. What could a possible set look like such that the mean is 20 and the absolute values of the differences of each number in the set from the mean are equal. Try to think of such a set with just 3 elements. Can you come up with one?

19, 19, 21? No, the mean is not 20

19, 20, 21? No, the absolute value of the difference of each number in the set from the mean is not equal. 19 is 1 away from mean but 20 is 0 away from mean.

Note that in this case, the only possible set that could fit the given criteria is one consisting of just an odd number of 20s (all elements in this set must be 20). Only then can each number be equidistant from the mean, i.e. each number would be 0 away from mean. If the numbers of the set all have equal elements, then obviously the standard deviation of the set is 0. It doesn’t matter how many elements it has; it doesn’t matter what the mean is! In this case, Statement 1 alone is sufficient so the answer would be A.

Takeaway:
If a set has an even number of distinct terms, the absolute values of the distances of each term from the mean could be equal. But if a set has an odd number of terms and the absolute values of the distances of each term from the mean are equal, all the terms in the set must be the same and will be equal to the mean.

Karishma, a Computer Engineer with a keen interest in alternative Mathematical approaches, has mentored students in the continents of Asia, Europe and North America. She teaches the GMAT for Veritas Prep and regularly participates in content development projects such as this blog!

# Solving GMAT Standard Deviation Problems By Using as Little Math as Possible

The other night I taught our Statistics lesson, and when we got to the section of class that deals with standard deviation, there was a familiar collective groan – not unlike the groan one encounters when doing compound interest, or any mathematical concept that, when we learned it in school, involved an intimidating-looking formula.

So, I think it’s time for me to coin an axiom: the more painful the traditional formula associated with a given topic, the simpler the actual calculations will be on the GMAT. (Please note, though the axiom is awaiting official mathematical verification by Veritas’ hard-working team of data scientists, the anecdotal evidence in support of the axiom is overwhelming.)

So, let’s talk standard deviation. If you’re like my students, your first thought is to start assembling a list of increasingly frantic questions: Do we need to know that horrible formula I learned in Stats class? (No.) Do we need to know the relationship between variance and Standard deviation? (You just need to know that there is a relationship, and that if you can solve for one, you can solve for the other.) Etc.

So, rather than droning on about what we don’t need to know, let’s boil down what we do need to know about standard deviation. The good news – it isn’t much. Just make sure you’ve internalized the following:

• The standard deviation is a measure of the dispersion the elements of the set around mean. The farther away the terms are from the mean, the larger the standard deviation.
• If we were to increase or decrease each element of the set by “x,” the standard deviation would remain unchanged.
• If we were to multiply each element of the set by “x,” the standard deviation would also be multiplied by “x.”
• If the mean of a set is “m” and the standard deviation is “d,” then to say that something is within 3 standard deviations of a set is to say that it falls within the interval of (m – 3d) to (m + 3d.) And to say that something is within 2 standard deviations of the mean is to say that it falls within the interval of (m – 2d) to (m + 2d.)

That’s basically it. Not anything to get too worked up about. So, let’s see some of these principles in action to substantiate the claim that we won’t have to do too much arithmetical grinding on these types of questions:

If d is the standard deviation of x, y, z, what is the standard deviation of x+5, y+5, z+5 ?

A) d
B) 3d
C) 15d
D) d+5
E) d+15

If our initial set is x, y, z, and our new set is x+5, y+5, and z+5, then we’re adding the same value to each element of the set. We already know that adding the same value to each element of the set does not change the standard deviation. Therefore, if the initial standard deviation was d, the new standard deviation is also d. We’re done – the answer is A. (You can see this with a simple example. If your initial set is {1, 2, 3} and your new set is {6, 7, 8} the dispersion of the set clearly hasn’t changed.)

Surely the questions get harder than this, you say. They do, but if you know the aforementioned core concepts, they’re all quite manageable. Here’s another one:

Some water was removed from each of 6 tanks. If standard deviation of the volumes of water at the beginning was 10 gallons, what was the standard deviation of the volumes at the end?

1) For each tank, 30% of water at the beginning was removed
2) The average volume of water in the tanks at the end was 63 gallons

We know the initial standard deviation. We want to know if it’s possible to determine the new standard deviation after water is removed. To the statements we go!

Statement 1: If 30% of the water is removed from each tank, we know that each term in the set is multiplied by the same value: 0.7. Well, if each term in a set is multiplied by 0.7, then the standard deviation of the set is also multiplied by 0.7. If the initial standard deviation was 10 gallons, then the new standard deviation would be 10*(0.7) = 7 gallons. And we don’t even need to do the math – it’s enough to see that it’s possible to calculate this number. Therefore, Statement 1 alone is sufficient.

Statement 2: Knowing the average of a set is not going to tell us very much about the dispersion of the set. To see why, imagine a simple case in which we have two tanks, and the average volume of water in the tanks is 63 gallons. It’s possible that each tank has exactly 63 gallons and, if so, the standard deviation would be 0, as everything would equal the mean. It’s also possible to have one tank that had 126 gallons and another tank that was empty, creating a standard deviation that would, of course, be significantly greater than 0. So, simply knowing the average cannot possibly give us our standard deviation. Statement 2 alone is not sufficient to answer the question.

Maybe at this point you’re itching for more of a challenge. Let’s look at a slightly tougher one:

7.51; 8.22; 7.86; 8.36
8.09; 7.83; 8.30; 8.01
7.73; 8.25; 7.96; 8.53

A vending machine is designed to dispense 8 ounces of coffee into a cup. After a test that recorded the number of ounces of coffee in each of 1000 cups dispensed by the vending machine, the 12 listed amounts, in ounces, were selected from the data above. If the 1000 recorded amounts have a mean of 8.1 ounces and a standard deviation of 0.3 ounces, how many of the 12 listed amounts are within 1.5 standard deviations of the mean?

A)Four
B) Six
C) Nine
D) Ten
E) Eleven

Okay, so the standard deviation is 0.3 ounces. We want the values that are within 1.5 standard deviations of the mean. 1.5 standard deviations would be (1.5)(0.3) = 0.45 ounces, so we want all of the values that are within 0.45 ounces of the mean. If the mean is 8.1 ounces, this means that we want everything that falls between a lower bound of (8.1 – 0.45) and an upper bound of (8.1 + 4.5). Put another way, we want the number of values that fall between 8.1 – 0.45 = 7.65 and 8.1 + 0.45 = 8.55.

Looking at our 12 values, we can see that only one value, 7.51, falls outside of this range. If we have 12 total values and only 1 falls outside the range, then the other 11 are clearly within the range, so the answer is E.

As you can see, there’s very little math involved, even on the more difficult questions.

Takeaway: remember the axiom that the more complex-looking the formula is for a concept, the simpler the calculations are likely to be on the GMAT. An intuitive understanding of a topic will always go a lot further on this test than any amount of arithmetical virtuosity.

By David Goldstein, a Veritas Prep GMAT instructor based in Boston. You can find more articles written by him here.

# Quarter Wit, Quarter Wisdom: Using the Standard Deviation Formula on the GMAT

We have discussed standard deviation (SD). We know what the formula is for finding the standard deviation of a set of numbers, but we also know that GMAT will not ask us to actually calculate the standard deviation because the calculations involved would be way too cumbersome. It is still a good idea to know this formula, though, as it will help us compare standard deviations across various sets – a concept we should know well.

Today, we will look at some GMAT questions that involve sets with similar standard deviations such that it is hard to tell which will have a higher SD without properly understanding the way it is calculated. Take a look at the following question:

Which of the following distribution of numbers has the greatest standard deviation?

(A) {-3, 1, 2}
(B) {-2, -1, 1, 2}
(C) {3, 5, 7}
(D) {-1, 2, 3, 4}
(E) {0, 2, 4}

At first glance, these sets all look very similar. If we try to plot them on a number line, we will see that they also have similar distributions, so it is hard to say which will have a higher SD than the others. Let’s quickly review their deviations from the arithmetic means:

For answer choice A, the mean = 0 and the deviations are 3, 1, 2
For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
For answer choice C, the mean = 5 and the deviations are 2, 0, 2
For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2
For answer choice E, the mean = 2 and the deviations are 2, 0, 2

We don’t need to worry about the arithmetic means (they just help us calculate the deviation of each element from the mean); our focus should be on the deviations. The SD formula squares the individual deviations and then adds them, then the sum is divided by the number of elements and finally, we find the square root of the whole term. So if a deviation is greater, its square will be even greater and that will increase the SD.

If the deviation increases and the number of elements increases, too, then we cannot be sure what the final effect will be – an increased deviation increases the SD but an increase in the number of elements increases the denominator and hence, actually decreases the SD. The overall effect as to whether the SD increases or decreases will vary from case to case.

First, we should note that answers C and E have identical deviations and numbers of elements, hence, their SDs will be identical. This means the answer is certainly not C or E, since Problem Solving questions have a single correct answer.

Let’s move on to the other three options:

For answer choice A, the mean = 0 and the deviations are 3, 1, 2
For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2

Comparing answer choices A and D, we see that they both have the same deviations, but D has more elements. This means its denominator will be greater, and therefore, the SD of answer D is smaller than the SD of answer A. This leaves us with options A and B:

For answer choice A, the mean = 0 and the deviations are 3, 1, 2
For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2

Now notice that although two deviations of answers A and B are the same, answer choice A has a higher deviation of 3 but fewer elements than answer choice B. This means the SD of A will be higher than the SD of B, so the SD of A will be the highest. Hence, our answer must be A.

Let’s try another one:

Which of the following data sets has the third largest standard deviation?

(A) {1, 2, 3, 4, 5}
(B) {2, 3, 3, 3, 4}
(C) {2, 2, 2, 4, 5}
(D) {0, 2, 3, 4, 6}
(E) {-1, 1, 3, 5, 7}

How would you answer this question without calculating the SDs? We need to arrange the sets in increasing SD order. Upon careful examination, you will see that the number of elements in each set is the same, and the mean of each set is 3.

Deviations of answer choice A: 2, 1, 0, 1, 2
Deviations of answer choice B: 1, 0, 0, 0, 1 (lowest SD)
Deviations of answer choice C: 1, 1, 1, 1, 2
Deviations of answer choice D: 3, 1, 0, 1, 3
Deviations of answer choice E: 4, 2, 0, 2, 4 (highest SD)

Obviously, option B has the lowest SD (the deviations are the smallest) and option E has the highest SD (the deviations are the greatest). This means we can automatically rule these answers out, as they cannot have the third largest SD.

Deviations of answer choice A: 2, 1, 0, 1, 2
Deviations of answer choice C: 1, 1, 1, 1, 2
Deviations of answer choice D: 3, 1, 0, 1, 3

Out of these options, answer choice D has a higher SD than answer choice A, since it has higher deviations of two 3s (whereas A has deviations of two 2s). Also, C is more tightly packed than A, with four deviations of 1. If you are not sure why, consider this:

The square of deviations for C will be 1 + 1+ 1 + 1  + 4 = 8
The square of deviations for A will be 4 + 1 + 0 + 1 + 4 = 10

So, A will have a higher SD than C but a lower SD than D. Arranging from lowest to highest SD’s, we get: B, C, A, D, E. Answer choice A has the third highest SD, and therefore, A is our answer

Although we didn’t need to calculate the actual SD, we used the concepts of the standard deviation formula to answer these questions.