Help! My Practice Test Score Seems Wrong!

MBA Interview QuestionsSo you’ve taken your GMAT practice test, looked at your score, and investigated a little further. If you’re like many GMAT candidates, you’ve tried to determine how your score was calculated by:

  • Looking at the number you answered correctly vs. the number you answered incorrectly, and comparing that to other tests you’ve taken.
  • Analyzing your “response pattern” – how many correct answers did you have in a row? Did you have any strings of consecutive wrong answers?

And if you’ve taken at least a few practice tests, you’ve probably encountered at least one exam for which you looked at your score, looked at those dimensions above, and thought “I think my score is flawed” or “I think the test is broken.” If you’re taking a computer-adaptive exam powered by Item Response Theory (such as the official GMAT Prep tests or the Veritas Prep Practice Tests), here’s why your perception of your score may not match up with your actual, valid score:

The number of right/wrong answers is much less predictive than you think.
Your GMAT score is not a function of the number you answered correctly divided by the number you answered overall. Its adaptive nature is more sophisticated than that – essentially, its job is to serve you questions that help it narrow in on your true score. And to do so, it has to test your upper threshold by serving you questions that you’ll probably get wrong. For example, say your true score is an incredibly-high 790. Your test might look something like:

Are you better than average?  (You answer a 550-level question correctly.)

Ok, are you better than a standard deviation above average? (You answer a 650-level question correctly.)

Ok, you’re pretty good. But are you better than 700 good?  (you answer a 700-level question correctly)

Wow you’re really good.  But are you 760+ good? (You answer a 760 level question correctly.)

If you’re 760+ level are you better or worse than 780? (You answer a 780-level question correctly.)

Well, here goes…are you perfect? (You answer an 800-level question incorrectly.)

Ok, so maybe one or more of those earlier questions was a fluke. Are you better than 760? (You answer a 760 question correctly.)

Are you sure you’re not an 800-level student? (You answer 800 incorrectly.)

Ok, but you’re definitely better than 780, right? (You answer a 780 correctly.)

Are you sure you’re not 800-level? (You answer an 800-level question incorrectly.)

And this goes on, because it has to ask you 37 Quant and 41 Verbal questions, so as the test goes on and you answer you own ability level correctly, it then has to ask the next level up to see if it should increase its estimate of your ability.

The point being: because the system is designed to hone in on your ability level, just about everyone misses several questions along the way. The percentage of questions you answer correctly is not a good predictor of your score, because aspects like the difficulty level of each question carry substantial weight. So don’t simply count rights/wrongs on the test, because that practice omits the crucial IRT factor of difficulty level.

Now, savvier test-takers will then often take this next logical step: “I looked at my response pattern of rights/wrongs and based on that it looks like the system should give me a higher score than it did.” Here’s the problem with that:

Of the “ABCs” of Item Response Theory, Difficulty Level is Only One Element (B)…
…and even at that, it’s not exactly “difficulty level” that matters, per se. Each question in an Item Response Theory exam carries three metrics along with it, the A-parameter, B-parameter, and C-parameter. Essentially, those three parameters measure:

A-parameter: How heavily should the system value your performance on this one question?

Like most things with “big data,” computer adaptive testing deals in probabilities. Each question you answer gives the system a better sense of your ability, but each comes with a different degree of certainty.  Answering one item correctly might tell the system that there’s a 70% likelihood that you’re a 700+ scorer while answering another might only tell it that there’s a 55% likelihood. Over the course of the test, the system incorporates those A-parameters to help it properly weight each question.

For example, consider that you were able to ask three people for investment advice: “Should I buy this stock at $20/share?” Your friend who works at Morgan Stanley is probably a bit more trustworthy than your brother who occasionally watches CNBC, but you don’t want to totally throw away his opinion either. Then, if the third person is Warren Buffet, you probably don’t care at all what the other two had to say; if it’s your broke uncle, though, you’ll weight him at zero and rely more on the opinions of the other two. The A-parameter acts as a statistical filter on “which questions should the test listen to most closely?”

B-parameter: This is essentially the “difficulty” metric but technically what it measures is more “at which ability level is this problem most predictive?”

Again, Item Response Theory deals in probabilities, so the B-parameter is essentially measuring the range of ability levels at which the probability of a correct answer jumps most dramatically. So, for example, on a given question, 25% of all examinees at the 500-550 level get it right; 35% of all those at the 550-600 level get it right; but then 85% of users between 600 and 650 get it right. The B-parameter would tell the system to serve that to examinees that it thinks are around 600 but wants to know whether they’re more of a 580 or a 620, because there’s great predictive power right around that 600 line.

Note that you absolutely cannot predict the B-parameter of a question simply by looking at the percentage of people who got it right or wrong! What really matters is who got it right and who got it wrong, which you can’t tell by looking at a single number. If you could go under the hood of our testing system or another CAT, you could pretty easily find a question that has a “percent correct” statistic that doesn’t seem to intuitively match up with that item’s B-parameter. So, save yourself the heartache of trying to guess the B-parameter, and trust that the system knows!

C-parameter: How likely is it that a user will guess the correct answer? Naturally, with 5 choices this metric is generally close to 20%, but since people often don’t guess quite “randomly” this is a metric that varies slightly and helps the system, again, determine how to weight the results.

With that mini-lesson accomplished, what does that mean for you? Essentially, you can’t simply look at the progression of right/wrong answers on your test and predict how that would turn into a score. You simply don’t know the A value and can only start to predict the “difficulty levels” of each problem, so any qualitative prediction of “this list of answers should yield this type of score” doesn’t have a high probability of being accurate.  Furthermore, there’s:

Question delivery values “content balance” more than you think.
If you followed along with the A/B/C parameters, you may be taking the next logical step which is, “But then wouldn’t the system serve the high A-value (high predictive power) problems first?” which would then still allow you to play with the response patterns for at least a reasonable estimate. But that comes with a bit more error than you might think, largely because the test values a fair/even mix of content areas a bit more than people realize.

Suppose, for example, that you’re not really all that bright, but you had the world’s greatest geometry teacher in high school and have enough of a gambling addiction that you’re oddly good with probability. If your first several – high A-value – problems are Geometry, Probability, Geometry, Geometry, Geometry, Probability… you might get all three right and have the test considering you a genius with such predictive power that it never actually figures out that you’re a fraud.

To make sure that all subject areas are covered and that you’re evaluated fairly, the test is programmed to put a lot of emphasis on content balancing, even though it means you’re not always presented with the single question that would give the system the most information about you.

If you have already seem a lot of Geometry questions and no Probability questions, and the best (i.e., highest A-value) question at the moment is another Geometry question, then the system may very well choose a Probability question. The people who program the test don’t give the system a lot of leeway in this regard—all topics need to be covered at about the same rate from one test taker to the next.

So simply put: Some questions count more than others, and they may come later in the test as opposed to earlier, so you can’t quite predict which problems carry the most value.

Compounding that is:

Some questions don’t count at all.
On the official GMAT and on the Veritas Prep Practice Tests, some questions are delivered randomly for the express purpose of gathering information to determine the A, B, and C parameters for use in future tests. These problems don’t count at all toward your score, so your run of “5 straight right answers” may only be a run of 3 or 4 straight.

And then of course there is the fact that:

Every test has a margin of error.
The official GMAT suggests that your score is valid with a margin of error of +/- 30 points, meaning that if you score a 710 the test is extremely confident that your true ability is between 680 and 740, but also that it wouldn’t be surprised if tomorrow you scored 690 or 720. That 710 represents the best estimate of your ability level for that single performance, but not an absolutely precise value.

Similarly, any practice test you take will give you a good prediction of your ability level but could vary by even 30-40 points on either side and still be considered an exceptionally good practice test.

So for the above reasons, a test administered using Item Response Theory is difficult to try to score qualitatively: IRT involves several metrics and nuances that you just can’t see. And, yes, some outlier exams will not seem to pass the “sniff test” – the curriculum & instruction team here at Veritas Prep headquarters has seen its fair share of those, to be sure.

But time and time again the data demonstrates that Item Response Theory tests provide very reliable estimates of scores; a student whose “response pattern” and score seem incompatible typically follows up that performance with a very similar score amidst a more “believable” response pattern a week later.

What does that mean for you?

  • As hard as it is to resist, don’t spend your energy and study time trying to disprove Item Response Theory. The only score that really matters is the score on your MBA application, so use your time/energy to diagnose how you can improve in preparation for that test.
  • Look at your practice tests holistically. If one test doesn’t seem to give you a lot to go on in terms of areas for improvement, hold it up against the other tests you’ve taken and see what patterns stand out across your aggregate performance.
  • View each of your practice test scores more as a range than as an exact number. If you score a 670, that’s a good indication that your ability is in the 650-690 range, but it doesn’t mean that somehow you’ve “gotten worse” than last week when you scored a 680.

A personal note from the Veritas Prep Academics team:
Having worked with Item Response Theory for a few years now, I’ve seen my fair share of tests that don’t look like they should have received the score that they did. And, believe me, the first dozen or more times I saw that my inclination was, “Oh no, the system must be flawed!” But time and time again, when we look under the hood with the psychometricians and programmers who consulted on and built the system, Item Response Theory wins.

If you’ve read this far and are still angry/frustrated that your score doesn’t seem to match what your intuition tells you, I completely understand and have been there, too. But that’s why we love Item Response Theory and our relationship with the psychometric community: we’re not using our own intuition and insight to try to predict your score, but rather using the scoring system that powers the actual GMAT itself and letting that system assess your performance.

With Item Response Theory, there are certainly cases where the score doesn’t seem to precisely match the test, but after dozens of my own frustrated/concerned deep dives into the system I’ve learned to trust the system.  Don’t try to know more than IRT; just try to know more than most of the other examinees and let IRT properly assign you the score you’ve earned.

Getting ready to take the GMAT? We have free online GMAT seminars running all the time. And as always, be sure to follow us on Facebook, YouTubeGoogle+ and Twitter!

By Brian Galvin and Scott Shrum.

Breaking Down Changes in the New Official GMAT Practice Tests: Unit Conversions in Shapes

QuadrilateralRecently, GMAC released two more official practice tests. Though the GMAT is not going to test completely new concepts – if the test changed from year to year, it wouldn’t really be standardized – we can get a sense of what types of questions are more likely to be emphasized by noting how official materials change over time. I thought it might be interesting to take these practice tests and break down down any conspicuous trends I detected.

In the Quant section of the first new test, there was one type of question that I’d rarely encountered in the past, but saw multiple times within a span of 20 problems. It involves unit conversions in two or three-dimensional shapes.

Like many GMAT topics, this concept isn’t difficult so much as it is tricky, lending itself to careless mistakes if we work too fast. If I were to draw a line that was one foot long, and I asked you how many inches it was, you wouldn’t have to think very hard to recognize that it would be 12 inches.

But what if I drew a box that had an area of 1 square foot, and I asked you how many square inches it was? If you’re on autopilot, you might think that’s easy. It’s 12 square inches. And you better believe that on the GMAT, that would be a trap answer. To see why it’s wrong, consider a picture of our square:

 

 

 

 

 

We see that each side is 1 foot in length. If each side is 1 foot in length, we can convert each side to 12 inches in length. Now we have the following:

DG blog pic 2

 

 

 

 

 

Clearly, the area of this shape isn’t 12 square inches, it’s 144 square inches: 12 inches * 12 inches = 144 inches^2.

Another way to think about it is to put the unit conversion into equation form. We know that 1 foot = 12 inches, so if we wanted the unit conversion from feet^2 to inches^2, we’d have to square both sides of the equation in order to have the appropriate units. Now (1 foot)^2 = (12 inches)^2, or 1 foot^2 = 144 inches^2.  So converting from square feet to square inches requires multiplying by a factor of 144, not 12.

Let’s see this concept in action. (I’m using an older official question to illustrate – I don’t want to rob anyone of the joy of encountering the recently released questions with a fresh pair of eyes.)

If a rectangular room measures 10 meters by 6 meters by 4 meters, what is the volume of the room in cubic centimeters? (1 meter = 100 centimeters)

A) 24,000
B) 240,000
C) 2,400,000
D) 24,000,000
E) 240,000,000

First, we can find the volume of the room by multiplying the dimensions together: 10*6*4 = 240 cubic meters. Now we want to avoid the trap of thinking, “Okay, 100 centimeters is 1 meter, so 240 cubic meters is 240*100 = 24,000 cubic centimeters.”  Remember, the conversion ratio we’re given is for converting meters to centimeters – if we’re dealing with 240 cubic meters, or 240 meters^3, and we want to find the volume in cubic centimeters, we’ll need to adjust our conversion ratio accordingly.

If 1 meter = 100 centimeters, then (1 meter)^3 = (100 centimeters)^3, and 1 meter^3 = 1,000,000 centimeters^3. [100 = 10^2 and (10^2)^3 = 10^6, or 1,000,000.] So if 1 cubic meter = 1,000,000 cubic centimeters, then 240 cubic meters = 240*1,000,000 cubic centimeters, or 240,000,000 cubic centimeters, and our answer is E.

Alternatively, we can do all of our conversions when we’re given the initial dimensions. 10 meters = 1000 centimeters. 6 meters = 600 centimeters. 4 meters  = 400 centimeters. 1000 cm * 600 cm * 400 cm = 240,000,000 cm^3. (Notice that when we multiply 1000*600*400, we can simply count the zeroes. There are 7 total, so we know there will be 7 zeroes in the correct answer, E.)

Takeaway: Make sure you’re able to do unit conversions fluently, and that if you’re dealing with two or three-dimensional space, that you adjust your conversion ratios accordingly. If you’re dealing with a two-dimensional shape, you’ll need to square your initial ratio. If you’re dealing with a three-dimensional shape, you’ll need to cube your initial ratio. The GMAT is just as much about learning what traps to avoid as it is about relearning the elementary math that we’ve long forgotten.

*GMATPrep question courtesy of the Graduate Management Admissions Council.

Plan on taking the GMAT soon? We have GMAT prep courses starting all the time. And be sure to follow us on FacebookYouTubeGoogle+ and Twitter!

By David Goldstein, a Veritas Prep GMAT instructor based in Boston. You can find more articles written by him here.

GMAT Tip of the Week: Your Mind Is Playing Tricks On You

GMAT Tip of the WeekOf all the song lyrics of all the hip hop albums of all time, perhaps the one that captures the difficulty of the GMAT the most comes from the Geto Boys:

It’s f-ed up when your mind is playing tricks on you.

The link above demonstrates a handful of ways that your mind can play tricks on you when you’re in the “fog of war” during the GMAT, but here, four Hip Hop Months later in the middle of yet another election season that has many Millennial MBA aspirants feeling the Bern, it’s time to detail one more. Consider this Critical Reasoning problem:

Among the one hundred most profitable companies in the United States, nearly half qualify as “socially responsible companies,” including seven of the top ten most profitable on that list. This designation means that these companies donate a significant portion of their revenues to charity; that they adhere to all relevant environmental and product safety standards; and that their hiring and employment policies encourage commitments to diversity, gender pay equality, and work-life balance.

Which of the following conclusions can be drawn based on the statements above?

(A) Socially responsible companies are, on average, more profitable than other companies.
(B) Consumers prefer to purchase products from socially responsible companies whenever possible.
(C) It is possible for any company to be both socially responsible and profitable.
(D) Companies do not have to be socially responsible in order to be profitable.
(E) Not all socially responsible companies are profitable.

How does your mind play tricks on you here? Check out these statistics from the Veritas Prep Practice Tests:

Socially responsible

When you look at the two most popular answer choices, there’s a stark difference in what they mean outside the context of the problem. The most popular – but incorrect – answer says what you want it to say. You want social responsibility to pay off, for companies to be rewarded for doing the right thing. But it’s the words that don’t appeal to your heart and/or conscience that are the most important on these problems, and the justification for “any company” to be both socially responsible and profitable isn’t there in the argument.

Sure, several companies in the top 10 and top 100 are both socially responsible and profitable, but ANY company means that if you pick any given company, that particular company has to be capable of both. And it may very well be that in certain industries, the profit margins are too slim for that to be possible.

Say, for example, that in one of the commodities markets there simply isn’t any brand equity for social responsibility, and the top competitors are so focused on pushing out competition that any cost outside of productivity would put a company into the red. It’s not a thought you necessarily want to have, but it’s a possible outcome given the prompt, and it invalidates answer (C). Since Inference answers MUST BE TRUE, C just doesn’t meet that standard.

Which brings you to D, the correct but unpopular answer. That’s not what your heart and conscience want to conclude at all – you’d love for there to be a world in which consumers will reject any products from companies that aren’t made by companies taking the moral high ground, but if you look specifically at the facts of the argument, 3 of the top 10 most profitable companies and more than half of the top 100 are not socially responsible. So answer choice D is airtight – it’s not what you want to hear, but it’s definitely true based on the argument.

The lesson? Once you get that MBA you have the opportunity to change the world, but while you’re in the GMAT test center doing Critical Reasoning problems, you can only draw conclusions based on the facts that they give you. Don’t let your outside opinions frame the way that you read the problem. If you know that you have some personal interest in the topic, that’s a sign that you’ll need to be even more literal about what’s written. Your mind can play tricks on you – as it did for nearly half of test-takers here – so know that on test day you have to get it under control.

Getting ready to take the GMAT? We have free online GMAT seminars running all the time. And as always, be sure to follow us on Facebook, YouTubeGoogle+ and Twitter!

By Brian Galvin.

Success Story Part 3: "The Final Days, and (*eek*)… Results."

(This is the third in a series of blog posts in which Julie DeLoyd, a Veritas Prep GMAT alumna-turned-instructor, will tell the story of her experience through the MBA admissions process. Julie will begin her MBA program at Chicago Booth this fall. You can also read Part 1 and Part 2 to learn Julie’s whole story.)

I had invested 42 hours of summer evenings learning about the ins and outs of the GMAT, and the time finally came for me to do the work on my own. I booked my test date for 5 weeks after my class ended, giving me enough time to go on tour one last time before I really hunkered down.

My band toured the Midwest for about 10 days, driving on vegetable oil fuel and breaking a lot of strings along the way. While another girl was driving, I’d pull out my Veritas Prep books and work on a few problems each day. I wasn’t absorbing too much, honestly, but it was good to keep my GMAT brain active. When I dropped off the girls at the airport, it was time to really get down to business. I set up a study schedule for myself for the last 3 weeks.

21 days to go, with 4 practice tests completed, my schedule looked something like this:
Monday Morning: Sentence Correction
Monday Afternoon: Practice Test

Tuesday Morning:
Go over results of Practice Test
Tuesday Afternoon: Geometry

Wednesday Morning:
Reading Comp
Wednesday Afternoon: Practice test

Thursday Morning:
Go over results of Practice Test
Thursday Afternoon: Critical Reasoning

Friday Morning: Combinatorics and Probability
Friday Afternoon: Problem Solving

Saturday Morning: Practice Test
Saturday Afternoon: Go over results

Sunday: Eat good food, Ride my bike, Spend time with dogs and lovey

Yes, it was a little intense, but it wasn

GMAT Tip of the Week

Make No Mistake

(This is one of a series of GMAT tips that we offer on our blog.)

If you’re serious about GMAT preparation, you have undoubtedly taken (or will take) a series of practice tests to replicate the test experience and work on pacing, stamina, etc. (At Veritas Prep, we offer our students 15 CAT exams, and you can access a free GMAT practice exam on our web site). Most tests have some useful diagnostic features that will demonstrate the time you took on each question, your performance on major question categories, etc. But as you have read in this space previously, one true key to peak performance is to be aware of the errors that you tend to make in particular, and the software isn’t quite advanced enough to highlight those for you.

To better understand your own mistakes in your own way, a simple activity is to go back through 2-3 practice tests that you’ve taken and label each of your mistakes with 4-6 keywords of your own. Examples could include “Data Sufficiency, didn’t forget first statement” or “Strengthen argument, misread conclusion”. The tests themselves may alert you to the fact that you missed, say, 6 of 15 Data Sufficiency questions, but with closer inspection you can determine just which errors you’re making when you do miss them. Then, if you scan those keywords to find repeats (Microsoft Excel can do this pretty easily for you), you’ll have a better idea of just which mistakes you’re prone to making, and you can focus your attention on them.

Your strategy for GMAT success should certainly include become more comfortable with the various skills and question types required, but you’ll likely find that you can increase your score just as significantly by minimizing the errors you tend to make most commonly. Minimize your errors, and maximize your score!

More more help on the GMAT, take a look at Veritas Prep’s GMAT preparation course options.