Taking a closer look at the latest STAAR readability study

The University of Texas Austin Meadows Center for Preventing Educational Risk released part one of the State of Texas Assessments of Academic Readiness (STAAR) readability study. The independent study was mandated by House Bill 3, the new school finance law, after several reports, including one conducted by Texas A&M researchers, argued that STAAR test items were above the reading level of the students being tested. Although some media headlines have been quick to suggest that STAAR is in fact written at appropriate grade levels, a closer look into the actual study is cause for pause.

Reviewers examined a total of 634 test items across 17 of the 2019 STAAR tests, including reading, mathematics, science, social studies and writing in grades 3 through 8.

The study tested three primary questions regarding the appropriateness of the tests:

Alignment: Are the items on the test aligned to the TEKS?

Item Readability: Are the test items (questions and answers) written on or below the grade-appropriate reading level of the students being tested?

Passage Readability: Are the passages written on or below the grade-appropriate reading level of the students being tested?

Here is what the researchers found:

Alignment: Researchers classified an item as aligned if it addressed a student expectation from the tested grade or any grade below. An overwhelming majority of the items were rated as aligned, with just eight questions found to not adequately assess the standards it was meant to assess.

Item Readability: Because there is “little guidance and even less research on evaluating the readability of test items,” the researchers used several methodologies to determine whether or not they could produce reliable results. For each method, the researchers obtained different results and therefore concluded that “analyzing item readability in a reliable manner for this report is not possible.”

Passage Readability: In order to determine text readability, the authors combined elements of various readability formulas and developed their own “test.” They evaluated passages using three indices: sentence length and difficulty, syntax (the way in which words are arranged) and “narrativity” (vocabulary load) and deemed a passage readable if two of the three indices fell within or below the grade level. Although 86 percent of the reading and writing passages met the criterion of two out of three and therefore were found to be grade-appropriate, only 31 percent fell within or below the specified grade band for narrativity.