Common Core state tests don’t produce “common” results

KEY TAKEAWAYS:

DESPITE THE GOAL OF CREATING CROSS-STATE TESTING WITH EQUIVALENT SCORING STANDARDS, RESEARCH FROM THE NATIONAL CENTER FOR EDUCATION STATISTICS (NCES) SHOWS NEITHER THE SMARTER BALANCED ASSESSMENT CONSORTIUM (SBAC) NOR THE PARTNERSHIP FOR ASSESSMENT OF READINESS FOR COLLEGE AND CAREERS (PARCC) PRODUCED SUCH UNIFORM RESULTS.
WHILE ACTUAL STATE PERFORMANCE STANDARDS WERE NOT EQUIVALENT FOR EITHER PARCC OR SBAC, POORLY DESIGNED GRAPHICS USED IN SEVERAL YEARS OF RESEARCH REPORTS FROM THE NCES CREATE THE INCORRECT IMPRESSION THAT THE TESTING CONSORTIA WERE PRODUCING EQUIVALENT SCORES FOR DIFFERENT STATES.
PARENTS NEED TO UNDERSTAND THAT EVEN IF THEIR STATE IS STILL A MEMBER OF PARCC OR SBAC, IT ISN’T VALID TO COMPARE THEIR CHILD’S STATE TEST RESULTS TO RESULTS FOR CHILDREN IN OTHER STATES IN THE SAME TEST CONSORTIUM.
PARENTS ADDITIONALLY NEED TO UNDERSTAND THAT EVEN THOUGH THEIR CHILD’S STATE TEST RESULTS SHOW PROFICIENCY, THAT MIGHT NOT BE THE CASE.

Remember those Common Core state tests from the Smarter Balanced Assessment Consortium (SBAC) and the Partnership for Assessment of Readiness for College and Careers (PARCC) that were supposed to provide equivalent scoring from state to state?

Well, check out the graphic in Figure 1. It shows how the performance level indicating proficiency reported by each state’s 4th grade reading test in 2019 mapped onto the National Assessment of Educational Progress’ (NAEP) grade 4 reading scale. Let’s talk about what Figure 1 shows us.

Figure 1

First, Figure 1 was assembled using data found in Table A-1 in a report from the National Center for Education Statistics (NCES) titled “Mapping State Proficiency Standards Onto the NAEP Scales, Results From the 2019 NAEP Reading and

Mathematics Assessments, Technical Notes.” Figure 1 shows the equivalent NAEP Scale Score (scale ranges from 0 to 500) for the performance level that each individual state’s fourth grade reading exam reported as proficient work in 2019.

For example, Virginia (VA), which appears on the far left of Figure 1, had the lowest state test standard for reading proficiency. Its Grade 4 reading proficiency standard only maps to a NAEP Scale Score equivalent of 200, a lower value than all of the other states.

On the other side of Figure 1, Tennessee (TN) had a much higher proficiency standard in its state assessment, mapping out to a NAEP Scale Score equivalent of 238.

The 38-point difference between Virginia’s and Tennessee’s proficiency standards becomes more meaningful when you consider that “Analysts who study NAEP often use 10 points on the NAEP scale as a back of the envelope estimate of one year’s worth of learning.”

As you examine Figure 1, also note that states that are part of either the PARCC or SBAC testing groups are identified. Louisiana (LA), for example, is part of the PARCC, while Delaware (DE) is an SBAC state.

One of the key ideas behind the Common Core test consortia was that they would create common tests that would be scored exactly the same across the different states that used their tests. This was to be done so valid cross-state comparisons could be directly conducted between states in the same consortium.

But Figure 1 clearly shows that didn’t happen, at least not in 2019. For example, the PARCC state with the lowest proficiency standard, Louisiana, had a NAEP score equivalent for what was considered proficient for Grade 4 reading of just 217. Meanwhile, the top PARCC state in Figure 1, Maryland (MD), had a much higher equivalent score of 231.

Clearly, the 2019 PARCC scoring standards for Grade 4 reading actually are highly uneven from state to state. The difference between Louisiana and Maryland’s proficiency standard would equate to something on the order of nearly an extra

year and a half of learning required to earn a proficient score in Maryland. Parents in Louisiana cannot and should not compare their child’s score directly to the scores of children in Maryland regardless of PARCC promises. The spread of the standard for proficiency across the SBAC states is also notable. The Delaware (DE) standard for reading proficiency maps to a NAEP equivalent Scale Score of just 218. Compare that to another SBAC state, Montana (MT), whose state testing standard for reading proficiency maps to a NAEP Scale Score of 229. That difference works out to more than a year of extra learning required to reach proficient status in Montana as compared to Delaware.

Clearly, just as happened with PARCC, promises that SBAC would provide equivalent test results across participating states just didn't pan out. Based on the state test to NAEP mapping conducted by the NCES, citizens in states that remain in either consortium should not try to compare results to other states that are also part of the consortium. In fact, there doesn’t appear to be much comparability between any of the state tests per Figure 1.

But...there is more....

Check out the graph in Figure 2. This is essentially just a cut and paste of a figure that appears on Page 6 in the main “Mapping State Proficiency Standards Onto the NAEP Scales, Results From the 2019 NAEP Reading and Mathematics Assessments” report from the NCES, which is a separate document from the Technical Notes.

Note: I added the material in red to the NCES report’s graph to facilitate interpretation.

Figure 2

In Figure 2 the plot of each state’s NAEP equivalent score for proficiency also includes whiskers for the possible plus or minus sampling error, formally called the Standard Error, in the plot. As a sampled assessment, NAEP’s scores always have measurement error. I didn’t put similar whiskers on the plot I created in Figure 1, but the standard errors are available in Table A-1 in the Technical Notes document for those who are interested.

Now, here’s the first problem with Figure 2: the graph indicates state test scoring in all of the SBAC states is to exactly the same standard for proficiency. But as we can see in the actual individual state results shown in Figure 1, in reality those SBAC states didn't have anywhere near equal standards for proficiency.

However, in this misleading graphic, which is extracted from the main report, not the separate Technical Notes publication, all of the individual SBAC states are depicted as though their state’s test standard for proficiency maps into exactly the same NAEP equivalent score as all of the other consortium states.

What’s going on here is the NCES report averaged together the results for all of the states in the SBAC consortium and then showed all of the states mapping to that same overall average result.

Even the standard error whiskers for all the SBAC states appear equal in size, as well. However, that also does not agree with the actual individual SBAC states’ standard errors as reported in the Technical Notes. For example, the Technical Notes show the true plus or minus two standard errors for the SBAC states range from 1.8 to 3.8 points.

Thus, Figure 2 creates the false impression that all the SBAC state tests were scored against the same standard for proficiency. You have to read the main NCES report carefully to understand what is actually shown here.

The way this graph should have been assembled, if an overall SBAC average was to be shown, was to display just one point, labeled SBAC, for that overall SBAC average score. But that didn’t happen.

A second problem in Figure 2 is the very same display approach was used to portray the performance of the three PARCC states. Even a quick look back to Figure 1 will confirm this is also incorrect.

It’s very possible the graph in Figure 2, which comes from the main NCES report, could be used to mislead people into thinking PARCC and SBAC were performing as they were supposed to, providing equivalent and comparable results for all the participating states in each collaborative.

In fact, that sort of problem seems to occur in an Education Week news report about similar mapping of the 2015 test results. In that article, a figure similar to Figure 2 is presented without any notation that the scores for PARCC and SBAC states are not plotted using actual individual state results but just present the

same, overall average results for all states in each consortium as each consortium state’s mapping result. This is also incorrect, but that’s really not Education Week’s fault. The NCES report isn’t very transparent about what is going on.

In the end, the real data for each state, as reported in Figure 1, make it very clear: in 2019 both PARCC and SBAC were not using one, uniform proficiency standard for Grade 4 reading across all of their consortium states. Comparing results from

SBAC-based Delaware’s state test to SBAC-based Montana’s would be no more valid than comparing results for non-consortia states like South Carolina and Georgia.

By the way, SBAC and PARCC were part of a very expensive, $350 million experiment. For whatever reason, that experiment clearly didn’t deliver as promised.

How long has this been going on?

An earlier version of the NCES mapping report covers data from 2015. This earlier report contains a very similar, equally misleading graphic to Figure 2, cut and pasted here as Figure 3.

Figure 3

As with the 2019 data presentation shown in Figure 2, the 2015 presentation shows all SBAC and PARCC results using the overall average score for all the states in each consortium instead of showing each state’s actual, and different, score. However, the 2015 report includes a table right in the report with actual mapping results for each state. As with the 2019 data, the mappings of actual data for both the SBAC and PARCC states most definitely are not the same across all of the states in the same consortium.

And, that is a problem.

These inaccurate presentations of PARCC and SBAC results have been going on in NCES reports for a number of years, providing an illusion that PARCC and SBAC were performing as desired. You have to dig into the data tables buried much later in the 2015 report and in an entirely different document for the 2019 report to determine the presentations are misleading.

All of this raises deeper questions for parents – even in non-PARCC and SBAC states. With so much variation in the scoring standards for state testing, how confident can parents be that if their child gets a score of Proficient on their state’s test, that this accurately represents their child’s ability?

After all, even when the K to 12 community set out to purposefully score state tests evenly, they were unable to do that.

And, as the real 2019 data map in Figure 1 shows, what counts as proficient

Grade 4 reading actually varies considerably from state to state.

https://bipps.org/blog/common-core-state-tests-dont-produce-common-results

https://bit.ly/3yTe7y0

Back