A common question we get from students who have just completed our free GMAT practice goes something like this: “I just got a score of X, but I see that I got Y questions right and Z questions wrong… How can my score be so low/high?” Embedded in this question is a bit of a lack of understanding of how a computer-adaptive test (CAT) works. Let’s dig in…
How A Computer-Adaptive Test Works
CATs such as the GMAT are built on algorithms that use something called Item Response Theory, or IRT. The IRT system has two main functions — item administration (determining which questions to give you) and ability estimation (calculating your score). And each system informs the other. Once the ability estimate feels confident that you’re above average, for example, it delivers questions that are most likely to help it determine “just how far above average?” — which means that you’ll miss several questions even if you’re in the 90th percentile, because it’s trying to determine whether you’re above that level and the only way to know is to continue testing your upper limit.
Now, that is a simplified explanation, and it strips out a good amount of IRT nuance and basically says this: Once the system has narrowed in on your ability you should theoretically get half the remaining questions wrong; if your true ability level puts you at the 60th percentile among all GMAT test takers, you should get all the 70th-percentile questions wrong and all the 50th-percentile questions right, and the system will keep bouncing you between those levels. That’s a pretty simplistic description of how it works (you will sometimes get really easy questions wrong and super hard ones right, after all), but it’s close enough for a good understanding of the scoring system.
Getting Questions Wrong Means the System Is Working
So what does all of that bouncing around mean? Once the test has a close read on your true ability level, AND assuming that the test has in its arsenal enough questions to keep challenging you at that level, you should then start to miss a lot of questions. After all, if you’re still getting a lot of questions right, then the system must not have you pegged at the right ability level. Or — and you will see this with a lot of practice tests available on the market — it knows your ability level, but it doesn’t have enough questions at that ability level to keep challenging you.
And get this: According to IRT theory, it doesn’t take too long to get there — within just six or seven questions the system usually has a pretty good feel for your ability level. So, if you take the Quant section of a GMAT practice test and the system figures you out after about seven questions, then you should spend the next 30 questions bouncing around your ability level. Of course your answer sequence from that point forward won’t perfectly be “right, wrong, right, wrong…” but you will probably start to get a good percentage of questions wrong. Or, you’ll spend an unreasonable amount of time on questions trying to get them right, and then you’ll pay the price at the end of the test when you run out of time, in which case you will have a bunch of “wrong” responses recorded at the end of the test section.
This Is Where the Math Gets Fancy
What’s really happening with the ability estimation is that it’s calculating the probability of someone with your responses having each score. And here’s where conventional wisdom in online GMAT forums tends to miss the nuance of IRT: We see “You get a question right it gives you a harder question / You get it wrong it gives you an easier one,” but that’s still too simplistic. What the system is really doing after each response is using all of your responses to date to estimate the probability of your having each score, and not all questions carry equal weight.
Again, the IRT system heavily relies on probability — some questions are much more potent at determining whether you’re above or below a certain threshold and others are a little less telling. The system takes these weights into account, particularly as your score moves. These weights also have to account for content delivery. The system might want to ask you a “more potent” (meaning it will give the system a lot of information about you based on how you respond) Sentence Correction question, but need to deliver you another Reading Comprehension passage, and so those RC questions might not carry the same weight as the questions before it. All of this is constantly happening in the background as you move through the GMAT.
So, the next time you hear someone recounting the number or percentage of questions they got right in their last practice test, just smile and nod. They probably don’t know much about CATs or Item Response Theory. We’ll let you decide whether to let them in on it or not!
By Brian Galvin