Metrics Of Success

Previous Chapter

Short Answer

“When a measure becomes a target, it ceases to be a good measure.”

Paraphrase attributed to Marilyn Strathern on Goodhart’s Law (1997)

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

Campbell’s Law (1979)

Being the “best” is simply a title based on a specific situation and circumstances. What is the best one day could no longer be the best the next day.

Being “good” is, pragmatically speaking, simply being better than others.

An example of this is a staff engineer asking a lower level engineer about their opinion on what to do

Metrics depend on goals and should be consistent and objective. Test scores are one type of metric to base performance on. Be wary of metrics enabling perverse incentives, or incentives encouraging undesireable, and/or unexpected, results (see “Cobra Effect” below). If a metric has no influence on answering someone’s question or drives a real decision, it’s probably a useless metric.

The Cobra Effect: A solution unintentionally makes a problem worse (Siebert, 2001).

When academics are a primary measure of a student’s worth, a system may treat that student like a commodity. If grades are also a metric of success, a teacher/instructor, given their current position and power, can only reasonably grade work they receive. Interpret that as you will.

When money is tied to metrics of success, people do things they normally wouldn’t do to positively affect, or even inflate, those metrics. For example, a lot of people care about money. If they don’t have money, they care about getting more money. If they have already have money, they still care about getting money but also want to keep their money.

Metrics are like compasses/snapshots indicating failure/success in a system rather than evaluating and defining a distinct problem. Metrics should inform decisions humans make, not replace decisions humans make. A “good” metric is one that’s meaningful, timely (or doesn’t take long to measure), measureable, and understandable. If it lacks one of these aspects, it may not be a good metric.

Metrics are diagnostic, not prescriptive.
Correlation does not imply causation.

As an opinion: if you implement ANY new specialized learning program (which is explicitly NOT designed for everyone) for an educational facility, but force every student there to go into it, you’re just replicating General Education again with additional steps and lost funds.

In general, a program that sells itself as a good fit “for everyone” is a program you should be skeptical of.
Regardless of benefits or drawbacks, you should be wary of educational programs and what their goals are (hint: it’s almost always money).

Long Answer

There’s a joke where, in education, we traded stamina for engagement.

Sadly, it isn’t a joke. This chapter help explains why that joke is now reality.

One part of that reality is most data and information you work with and see is affected by bias. It comes from decisions on how information is displayed, gathered, and omitted to name only a few ways.

Why Metrics of Success?

Even if you’re not in business or finance, there are terms from those fields you should know. Metrics are measures to reveal whether your current system performs as intended or requires changes. Metrics also apply to educational systems to determine implementation effectiveness based on outcomes (U.S. Department of Education, 2025).

You want to choose the right metrics to gauge success. A single metric risks becoming a target to aim for. Using multiple metrics reduces that risk while simultaneously monitoring many areas. What those metrics are depends on the goals of the individual and/or organization using them. Goals may include maximizing outcomes (scores and pass rates), cultivating qualitative aspects (engagement or relationships), or encouraging growth (attitudes, perspectives, or character). It could also be a mix of these or just simply making it to the next day.

There are two types of data associated with metrics: quantitative and qualitative. Quantitative data is a specific and objective measure capturing numbers, quantities, and ranges. Qualitative data is based on subject and explanatory measures of qualities, traits, and characteristics. In short, quantitative is numbers and qualitative is words and images. Quantitative data may work alongside qualitative data, or separate from quantitative data, depending on your needs.

It is possible to convert qualitative data into quantitative data through methods like sentiment analysis, natural language processing (NLP), and machine learning (ML). You can convert from quantitative to qualitative with categorizing, bucketing, and labeling.

Details of these techniques are beyond this guide, but awareness helps if you decide to pursue deeper analysis later.

Measuring data may allow bias to creep in through various means like sampling choices and demographics. Interpretations and student factors may also introduce bias. Several examples of bias include, but are not limited to, the following (Rogers, Jonker, 2024):

Confirmation bias
Observer bias
Selection bias
Recall bias
Reporting bias
Sampling bias
Survivorship bias
Measurement bias
Social bias

Measures are intended to be objective; not influenced by bias or prejudice. The problem is bias is unavoidable, unconscious or otherwise, and varies across person to person. Unchecked and unaccounted bias will lead to detrimental effects. In the worst case, you may face legal and ethical consequences for inaccurate decisions made using the data collected and metrics produced. You can reduce bias with strict verifications, transparency, and representative sampling over selective subsets.

You may see two terms associated with metrics of success: Objectives and Key Results (OKRs) and Key Performance Indicators (KPIs). OKRs track progress and direction toward specific goals while KPIs monitor performance and outcomes over time. KPIs are better for recurring tasks while OKRs are better for one-off tasks. OKRs often use qualitative data while KPIs use quantitative data.

Any metric can be “gamed” or optimized. Despite this, the teacher/instructor goal of ensuring students learn what you teach remains.

Metrics (In General)

One area of quantitative data comes from metrics for student success. Some quantitative examples include (Fields, 2024):

GPA/Grades (Academic Performance)
Support Service Utilization (Accommodations)
Time to Completion
Graduation Rate
Retention Rate
Post-Graduate Employment Rate
Test Scores (State, ACT, SAT, IB, AP, etc.)
Attendance & Participation

Generally speaking, a metric deals with performance, growth, and/or development. You may also track metrics like GPA and test scores through other, related metrics such as proficiency rates. This is when students are bucketed based on performance and you want to showcase a high percentage scoring at or above, for example, a “proficient” level.

Most analysis of results should be treated as a snapshot rather than a valid performance evaluation. This may be because only a single metric is focused at a time, there are insufficient growth models, or multiple metrics aren’t used in conjunction for analysis. For example, if using gap analysis, or even analysis in general, to analyze education results, it may have critical flaws for three reasons:

Typical experiments test the same group, or same subject, before and after an experiment to isolate effects and ensure only one variable is changed at a time. School environments rarely, if ever, allow for this controlled testing.
A given grade level educates different groups of students each new academic period.
A bad score given one year may improve the next year, despite no changes implemented by a school/teacher, due to the students changing.

There’s also metrics not related just to students, but also staff members (and sometimes parents). These may focus on morale, finances, and general resource allocation. Some examples include:

Parent Engagement
Teacher Retention
Morale and Satisfaction
School Quality (physical conditions primarily)
Per-Pupil Expenditure
Teacher Qualifications
Teacher-to-Student Ratio
Counselor-to-Student Ratio

There are a whole host of metrics, but there’s a few I want to call out because I usually see them as “objectives” rather than performance indicators.

“Lines of Code”

Alternatively: Why Context Matters

In short: context validates metrics.

Here’s an example I want to cover quickly. A metric I’ve seen mentioned here and there in 2022-2025 is “lines of code.”

People phrase it like [number of] lines of code is better if it’s a big number and worse if it’s a low number. For someone unaware, it sounds like it makes sense, but we’re actually dealing with perverse incentives here.

It’s improper to treat the metric this way. The context isn’t interpreted correctly and the logic behind the story is misconstrued. It betrays one important fundamental design concept, amongst many others: simplicity (or clarity) trumps complexity.

I’ll put it another way: Imagine you have a problem to solve and there’s two solutions to handle it. One option contains 10,000,000 lines of code and the other option contains 10,000 lines of code. Both options will perform the job to the user’s needs.

Programming languages have wildly varying quantities of lines of code due to the logic and structure of those languages. It’s like comparing two different spoken languages. “I love you” vs “Ich liebe dich” both mean the same thing, but the latter has more letters. That’s something you cannot reconcile easily between languages, so it dilutes the metric’s veracity.

If something goes wrong while you’re solving a problem, what would you rather troubleshoot/fix: the 10,000,000 lines option or the 10,000 lines option? I’d choose the 10,000 lines option because it’d be way faster and simpler; you probably would too.

I could go on and on about other examples. Number of pages, word count, step count, number of clicks etc. Metrics that only sound good on paper to “optimize,” but don’t prove something is effective/efficient, are functionally worthless.

What If Metrics Mean Different Things?

If a metric means something different for multiple people, you’re on the highway to communication errors, which potentially means lower performance and lost revenue.

If you’re in a position to do it and maintain it (because things always change over time), start with a single document or wiki page outlining what the metrics are, what each metric means, and how each metric is measured. This includes methods like filters, source tables, grain (what a “row” represents, like one row = one customer), formulae, and more. Afterwards, force every report and dashboard across the entire organization to adhere to that “common language” you established.

The same thinking applies to other types of documents too, like a project charter template, where you define a standard outline for people to adhere to.

This method may receive pushback or disagreement, because many people means many opinions, and may take a long time to set up or get implemented correctly. It does, however, mitigate the issue where everyone uses the same metric, but tries to measure a different thing with that metric.

You could also implement a ticket system, which forces people to write down what they actually want or need before requesting or implementing a change for it. It’s like doing a double take to confirm “do I actually need to change” before you go in and change it guns blazing.

Also ensure the ticket system has clear intake standards, like mandatory fields to fill out and write text in, to mitigate communication issues.
No ticket = no work

Diagnostics with Metrics

Let’s say you’re tracking a particular metric, or a group of metrics, and you notice values that don’t look good.

Before you jump to conclusions and implement new changes, realize one fact: quantitative data (and qualitative to some degree) is good at revealing what is happening, but not why something is happening, how something is happening, or if the metric is even valid to continue measuring in the first place.

It’s possible to mitigate the why issue through additional metrics, but it may not fully solve the problem of why.

To dive into one metric example: tracking attendance.

Attendance is low this year. It could be from one student chronically absent, absences from multiple students, or another factor or a combination of factors.

From this information alone so far, you can reasonably assume students are not going to school.
This is your what; the problem.
You don’t have a why yet.

Do not draw any conclusions from the above; you’re only identifying problems so far.

As for why it’s happening, you’ll need to investigate, collect evidence, and look into other underlying issues. These issues may conflict with each other, seem unrelated to a problem, or not be identified with data alone. Some examples of investigation points may include:

Are there bullying problems or social conflicts?
Is there an issue outside of school, at home, or in the environment?
Were new, external policies implemented affecting attendance?
Does the student not perceive value in the class or material?
Are there resources lacking to help with attendance?
What other metrics might indicate an issue alongside low attendance?
Is there an issue with the management of a class?
Do underlying needs indicate a symptom of another problem (e.g. bad behavior as a symptom of boredom)?
Are there other logistical barriers present with parents, staff, and other peers?
Is there a change in demographics?

Once you start identifying which issues exist, and if they’re valid issues (veracity) that exist, then you can gather data for them, then disaggregate the data to determine how universal or specific a problem is.

After data is collected and processed, then you can effective design whatever solution is needed to solve the problem(s). Keep in mind three things, however:

A seemingly good solution may actually generate more problems than before.
Even the perfect solution can still fail due to factors outside of your control.
Designing solutions for problems and issues which don’t exist is a waste of time, a waste of resources, and a detriment to every involved.

The White and Orange Exception

There’s an example I like to talk about where a better result occurs in practice despite established metrics saying it’s worse: white text on orange background in a button.

There’s even a dedicated case study on this topic from Ericka O’Connor (2019).

Designs need to consider similar accessibility laws. Instead of ESSA and LRE, however, it’s ADA, Section 508, and WCAG (Web Content Accessibility Guidelines). Someone may risk legal actions if they aren’t designing with consideration for these rules.

Aside: Section 508 is specifically for federal agencies/programs, not a universal rule.

There were two main levels of WCAG compliance, AA and AAA, when the case study was written in 2019. Designs should meet a minimum contrast ratio of at least AA, or 4.5 for small text and 3 for large text, but a higher “value” is generally better. For reference, contrast ratio is how well text of one color appears against a background of another color.

As of writing this (2026), WCAG has since been updated and the above was relevant for WCAG 2.1.

In that same article, the contrast ratio for black text on orange background was 6.44, or AA, while the contrast ratio for white text on orange background was 3.26, or AA (for large text requirements). Despite metrics indicating black was better, white text had a human factor play where 61% of participants found white on orange was easier to read compared to 39% with black on orange.

This exception is here to remind you of two things:

Metrics are not always perfect and still diagnostic.
You should still aim to meet some level of standard(s) even if all metrics aren’t perfectly favorable.

How Education is Evaluated (with Examples)

In the United States, as of March 2026, there’s multiple ways to evaluate its learning and education systems compared to other nations in the world. These include, but are not limited to, TIMSS (Trends in International Mathematics and Science Study), PISA (Programme for International Student Assessment), and NAEP (National Assessment of Educational Progress). I’ll examine this three in particular to give readers an idea what they aim to accomplish.

These particular examples deal with K-12 education instead of tertiary/university education.

TIMSS is an ongoing assessment handling mathematics and science; I’ll reference the 2023 report made by von Davier et al (2024) for details. It’s conducted every 4 years with some variation, evaluates 4th and 8th grade students, and is intentionally designed as a benchmark to compare results against other nations and discover ways to improve education. Specific topics include items like Measurement and Geometry, Data and Probability, Life Science, Physics, and other sciences typical of curriculms for those age groups at the time. The criteria for evaluation is straightforward; a point system where students earn more points based on their accuracy and precision across series of questions. This point system neatly compiles all scores, can be filtered down by specific questions or question types, and opens up further analysis into student learning outcomes.

There’s also evaluations based on the environment, such as home environments vs school environments, to provide context and better interpret reasons for any results.

PISA focuses on mathematics, reading, and science to test student’s subject understanding “in and out of schools for their full participation in societies” (OECD, n.d.). Though PISA focuses on those subjects, they alternate which subject has the greatest focus on a given survey year and include minor topics, such as financial literacy and creative thinking. In most participating countries, its assessments and testing is done by student age (15-year-old students) as it’s when they’re nearing the end of compulsory education. Unique aspects compared to other assessments PISA achieves is similar to how data analysts approach data: finding ways to answer “business” questions. To summarize some examples what it aims to answer from OECD’s own website (same OECD source as before, n.d.):

Are schools preparing young people for adult life?
Can students properly apply skills and knowledges to problems?
What are their motivations and beliefs for learning and self-improvement?

NAEP’s purpose is to evaluate education achieve within the United States, rather than compare the USA against other nations. Since results are done with student samples, IES (Institute of Education Sciences) has some basic rules for interpretation I’ll post here verbatim (2025):

Individual scores are not reported.
NAEP data cannot support cause-and-effect claims.
Differences in scores must be statistically significant to be reported as actual differences.
State and district score rankings imply score differences that may not be statistically significant.
Scoring NAEP Proficient is not equivalent to achieve a proficient score on other assessments.
NAEP results cannot be compared across other subjects.

NAEP’s rules better enforce what the metrics stand for and mitigate the risk of misinterpretation for audiences knowledgeable and unknowledgeable. They also provide transparency about exclusions, such as students with allowable accommodations that cannot participate. Scores for assessments are reported using point scales from 0-500 or 0-300, where a higher number means a better score. There’s a variety of subjects involved, including the Arts, Science, Economics, History, Reading, Mathematics, and so on; essentially the “core” disciplines you may see students encounter in schools.

There’s other forms of testing conducted by organizations aside from these. The main thing to watch out for is whether or not educators also get individualized data back. While an average of a large sample, or even a population, is useful, if you cannot drill down data into how specific students/groups, you risk a distorted view of what’s really happening.

100% Graduation Rate

This is a funny metric to use for success and the same applies for job placement rate. It’s also a case of why you need multiple metrics of success instead of just one to help verify data authenticity and why making a metric a target proves Goodhart’s Law.

100% graduation rate means you’re telling me every student in the history of the school (or specific year) got passing grades on every subject they went through.

If you see this value, be skeptical. Be doubly skeptical if it comes from a school with a large pipeline of students (i.e. a large sample/population size) going through it.

I’d also be skeptical of >= 95% as well, but that’s my personal opinion.
You’d also want to reference standardized scores the school achieves for testing it cannot easily manipulate, like SAT and ACT scores.

There’s two immediate ways that came to my mind to manipulate this.

Lower the standards for what is considered “graduation”
Pad grades and/or implement a minimum grade policy across the board

Which of these can you implement right now at zero to low cost?

If you guessed #1 and #2, you are correct. It is disturbingly easy to accomplish these methods as well, even if they’re legally and ethically questionable.

A school system, across its entire journey, should function as a increasingly more difficult filter and impose more responsibilities and accountability as students age. When this metric, and related success metrics, are manipulated to display good results, it undermines the entire intellectual journey and affects the education system as a whole at the expense of every student. If you manipulate what counts as success and lower the standards to meet success, or also remove accountability for students not meeting educational expectations, it removes legitimate standards to prepare students for what comes next in their lives.

Average Daily Attendance (ADA)

Generally speaking, you want students to attend class. Despite good intentions, this metric directly links attendance numbers to financial resources like increased budgets. It’s also a metric mostly outside of your control thanks to a few factors:

The parents (or students themselves if adults) are responsible for ensuring a student gets to school.
Students could still get to school, but not attend classes of their own volition.
There’s a strong association between socioeconomic status of students (and their parents) and school attendance (Klein et al., 2020).
- i.e. Lower resources = more absences

Everyone cares about money. If they have enough money, then they care about not losing money. A school (and by extension a business) is no different as they want funding to continue operations. If a school cannot perform well, they may receive reduced funding from various sources (federal, state, and local). This reduced funding may occur even when it is objectively impossible to meet attendance goals set by entities outside facilities as well. A slash in funding may also lead to a downward spiral in performance as well, which is difficult to recover from.

If this is a success metric, it has real world value often tied to wealth, budgets, and income, so policies may change to try and improve it because money is important. Some policy change examples, which may be good or bad depending on your perspective, include:

Heavily enforcing truancy laws
Redefining policies to keep students in classrooms
- This may mean not enforcing consequences which take students out of classrooms as it may affect attendance metrics
Raising awareness of the effects of excessive absences
Investing in resources, like improvement plans, to mitigate absences
Linking number of absences to reduced benefits and privileges available to students
Potentially faking attendance records
Altering what counts as excused vs unexcused absences, and what is an “absence” in general

On the other side of the coin, you may focus on chronic absentee rate (or, from a business perspective, “churn rate”). This typically means students with 10% or more school days missed for any reason in a given period of time. That metric may employ similar strategies and tactics like the examples above to make it look good to “investors” and other onlookers.

Enrollment Metrics

Enrollment is another metric I’ve seen used to boast how effective a class, school, or otherwise is. However, it suffers a potential problem as a vanity metric vs a meaningful metric and this problem is more pronounced when it’s measured in isolation.

For anyone familiar with MMORPGS, video games, or other forms of media, you may have seen terms like the following before similar to enrollment as a metric:

Subscriber Counts
Total Number of Players/Subscribers
Concurrent Players (or “Current”)
Most Played Game

If this is a success metric, you may perform mass adoption strategies, or reaching the maximum audience available to you, to showcase improvement and success. Enrollment, as a metric, generally only cares about number of students in a given class/school. It does not care about the performance of those students or what those students do while inside of the school.

Video game example: A game can sell 5 million copies, but if 4 million players quit before completing the game, it may indicate only financial success and not overall success.

Though generally you want high enrollment, you may risk pursuing high enrollment at the cost of other metrics such as:

Engagement
Performance
Retention

It’s not to say enrollment is entirely a bad metric. For example, let’s say you track enrollment numbers and there’s a sudden drop in students from one year to the next year. This is indicative of a problem warranting investigation and figuring out what the cause is.

Test Scores

This is typically used to indicate student understanding of material across various grade levels. It’s often examined alongside other metrics, such as attendance and GPA, and is a metric sometimes outside of the control of a teacher or school.

It’s also possible to “game” this metric by managing the test-taking population itself, such as:

Over-identification of accommodations for students on tests, such as extended time.
Getting low-scoring students to not appear on test days

State tests and standardized tests are usually the subjects of these metrics. If, however, these metrics are defined as success criteria, especially with money attached, some things may happen at a school:

The curriculum shifts to focus on tested subjects.
Subjects not covered on these tests may be neglected.
Skills not tested or harder to assess on tests may be sidelined in favor of easily defineable skills.
The classroom goes from a place of genuine learning and discovery to a place specifically designed to get high test scores.
Accountability may be sacrificed if it means increasing the metric.

To maximize this metric, you may risk the well-being of both high-performing and low-performing students, which means a failure in equitable education for all students.

You may also see more instances of cheating, higher pressures on students to do well, and educators altering test scores to showcase higher results than actually earned. Many of these actions have real incentives promoting more desperate behaviors, such as college admissions, scholarships, company and job placements, and more based on test scores and performance.

No Child Left Behind and Every Student Succeeds Act

Normally I’d place these in the Legality chapter, but I believe these acts apply better to this chapter. It’ll also be tough to talk about this without bias, so I apologize in advance and will try my best.

For context, No Child Left Behind doesn’t exist anymore. It was sunsetted and replaced in 2015 with the Every Student Succeeds Act (Hirschfeld Davis, 2015).

If I were to assume any opinions about these acts:

It’s cutting instruction time to prove instruction is effective (i.e. a focus on testing at the exclusion of other items).
It also encouraged passing up students into higher grades, despite clear evidence they are not ready to progress in the next step of their learning journey (i.e. grade inflation to prevent failure).

To avoid delving into politics too much, the most practical consequence of not meeting outlined goals and requirements was a loss of funding. Though there was “no child left behind” and “every student succeeds” on paper, students were left behind in practice.

E.g. As a hypothetical, a 12th grader with a 5th grade reading level eligible for graduation.

These acts are responsible for establishing metrics as a target to meet, which means the education system changes, for better or for worse, towards a metrics-based system. You may also see more staff and more bureaucracy, at the administrative or support level, to hold people responsible for meeting these metrics. You may also see an increase in perverse incentives or initiatives which, despite good intentions, usually backfire.

While these acts provide data for metrics and analysis, doing so at the cost of preventing funding for education is where it goes sour. Even if you did hide the inputs (the test scores and their results), a savvy person could deduce the outputs (funding) over time. Since these standards are tied to money, which a lot of people want and need, deviating from these standards is frowned upon, if not outright barred, in many education systems. This may mean more focus on the test(s) and its results, but perhaps at the cost of any actual learning and thinking.

Misalignment Case: IB vs Implementation

International Baccalaureate (IB) is the example I’ll use here. This isn’t a positive or negative review of said program; only a review of their mission statement vs metrics.

Let’s say the goal of IB is, paraphrased, learning and developing skills for a student’s future (IBO, n.d.). That could imply many things, such as learning through a trial by fire (i.e. an opportunity for educational rigor) or designing specifically for students with higher abilities and self-motivation, but the goal is still learning.

The metrics that administration, or other people responsible for implementing programs like IB into an institution, may care about grades, graduation rates, student retention rates, and test scores.

This is a case of misaligned goals, not misaligned metrics. It happens a lot even outside this example. Both sides could even be utilizing the same metrics, but placing different weights on those metrics.

One group wants to ensure high-quality learning goals are met while the other group wants to ensure business goals are met. One might attempt a compromise and “meet it halfway” but that is not a great solution and can negatively affect the goals of both sides. Programs like these may also be implemented to create magnet schools or attempt to bring up low-scoring schools to better meet metrics and stay afloat.

The program offered by one group may also not match the existing system it’s trying to fit into. If that occurs, then you’re looking at extra work, cumbersome overlapping, and excessive overhead on the part of staff, teachers, administration, students, etc. Though the system may be excellent or great for whomever it’s catered to, it requires buy-in and compatibility with the current system(s) to mitigate undue issues down the line.

The easiest to state, but hardest to do, solution is simple: You need full buy-in or no buy-in (i.e. appropriate, QUALITY support/resources and student desire to enter those programs) for initiatives like these, or else you’re setting up the entire institution for failure. A program like this should also be separate, but together, within the institution, like how multiple majors with vastly different curriculums are all in the same campus area at universities.

To put it like I’m in a negotiation: “Yes, we’ll add this program on the condition we get resource A, B, and C before this stated time. If this is not feasible, we will not do it.”

Open Enrollment

Before continuing, consider the following context: generally speaking, a parent wants what is best for their children.

Let’s say you’re looking around various options. You want the best option you’re able to get, even if it means going to desperate lengths or alternative funding to acquire it. There’s even reviews for all options confirming their value, often through metrics including, but not limited to, the ones discussed here already.

Adding in the context from earlier, there’s no logical leap to make and it makes perfect sense that students will go to the “best” school they can get into.

Open enrollment doesn’t generate bad schools; it merely exposes what school is considered “bad” based on several metrics, then offers options to go into other schools which may perform better. If poor performance schools lose enough students to better schools, then they may close down. That action, however, lowers the amount of schools in the pool students can go to, which increases the risk of overcrowding and negates whatever benefits open enrollment wanted to achieve.

Though not directly related, an article from Josh Bivens talks more about economic inequality through “secular stagnation” (2017) and may establish more parallels.

Though open enrollment permits mobility for many groups, system-wide issues affecting that same policy stop it from generating any real gains.

For video-game savvy readers, it’s like developers opening access to a popular MMORPG server without adding hardware or capacity to support the sudden influx of players and creating “dead” servers as a byproduct.

Data and Data Issues

Many organizations collect data on all sorts of things. This isn’t limited to education systems either.

The problem arises in three ways:

There is too much data to process in a reasonable timeframe.
The data collected is messy and not cleaned properly.
The collected data isn’t even useful.

#1 is an issue related to your system’s capabilities, rather than any process being wrong, and delves into the cost of collection and analysis. Nevertheless, if you have too much data and aren’t sure what to do with it, the question is why are you collecting so much data in the first place?

#2 is a process issue concerning hygiene. Data is collected, but it may be in an unrefined state you cannot do anything with. This may be remedied with data professionals and the appropriate tools to extract, transform, and load messy data into “cleaned” data to conduct analysis on. Even if data is cleaned, you still need people to interpret said data and know what metrics they’re measuring (and why they’re measuring it), or you land on a half-baked solution at best.

#3 is a process issue concerning intent. Collecting data for the sake of collecting data is a fool’s errand and a drain on financial, human, and technological resources. At the same time, this issue may be unavoidable due to legal issues like compliance laws and technological issues like vendor defaults you cannot modify. The solution is simple: refine what data you actually need to collect and distribute, be intentional with what you need vs don’t need, and re-evaluate the scope of any data operations.

Overall, there’s a lot of issues with data even if you do everything right on your end. Of particular note:

Data (and processes for data) are NOT static, change over time, and may not work in the future.
Very few people, if no one, wants to govern data (i.e. set data policies) from observation.
Many stakeholders don’t know what they want, are vague, or change expectations frequently.
Access to data (and technology/tools) you need for your duties may be restricted outside your control.

If you combine the first two points above, you get two new problems: the data “being wrong” and getting multiple reports with multiple numbers. The first problem means there’s no clear definition of what is right or “normal” in the data context. The second problem means multiple people are operating with vague (and often different) requirements, rigor, and contexts, possibly even different data sources, so each creates their own version of what is “right” and scramble to find a resolution when things go wrong.

The third problem is something to expect in any business across any context. What sounds like a fun idea in the moment risks scope creep and derailing a project if you try to implement it, so getting people to figure out what they actually want is almost its own skill entirely.

Data and technology access restrictions can distort what the analyst, or even end user, can interpret with the data. Say, for example, you submit students to a standardized test and their results are submitted into the testing system. That system intakes the data and (hopefully) cleans it up, but the data made available to educators is heavily limited. This is a classic black box problem within a data pipeline. It’s not necessarily bad, but it can hamper what you can do with it and make of it. You’ve likely no way of really knowing what’s happening, which may lead you to questioning the data’s importance altogether. Sometimes you can negotiate what’s included in these reports, while other times you’re stuck.

What about if the entire class fails?

Well, that depends. Which one of the two probable scenarios is it below?

If it isn’t one of these (which is possible!), then select the one it may be closer to.

A: Did the entire class do the assignments, exercises, exams, etc. and still fail?

B: Did the entire class not do assigned work, which is left ungraded and brings down their overall grade to failing?

If it’s scenario A, and includes the students that genuinely did try their best and still fail, it’s likely the teacher’s fault. If it’s scenario B, it’s more likely the student’s fault.

Notice how I’m saying likely and not answering with certainty. Metrics are diagnostic and there could be another underlying problem present.

Overall, it’s an application of the classic saying: bring a horse to water, but can’t make it drink. If you’re a teacher or instructor reading this, document their failures and be prepared to present a case about it if it happens. About all I can reliably say here.

Bibliography

Bivens, J. (2017, December 12). Inequality is slowing US economic growth: Faster wage growth for low- and middle-wage workers is the solution. Economic Policy Institute. https://www.epi.org/publication/secular-stagnation/
Campbell, Donald T (1979). Assessing the impact of planned social change. Evaluation and Program Planning. 2 (1): 67–90. doi:10.1016/0149-7189(79)90048-X.
von Davier, M., Kennedy, A., Reynolds, K., Fishbein, B., Khorramdel, L., Aldrich, C., Bookbinder, A., Bezirhan, U., & Yin, L. (2024). TIMSS 2023 International Results in Mathematics and Science. Boston College, TIMSS & PIRLS International Study Center. https://doi.org/10.6017/lse.tpisc.timss.rs6460
Fields, E. (2024, November 15). How do you measure student success? Enrollify. https://www.enrollify.org/blog/how-do-you-measure-student-success
Goodhart, C. (1975). Problems of Monetary Management: The UK Experience. Papers in Monetary Economics. Papers in monetary economics 1975; 1; 1. - [Sydney]. - 1975, p. 1-20. Vol. 1. Sydney: Reserve Bank of Australia.
- Paraphrase attributed to Marilyn Strathern (1997).
Hirschfeld Davis, J. (2015, December 11). President Obama Signs Into Law a Rewrite of No Child Left Behind. The New York Times. https://www.nytimes.com/2015/12/11/us/politics/president-obama-signs-into-law-a-rewrite-of-no-child-left-behind.html
Institute of Education Sciences (IES). (2025, July 31). Understanding Results - NAEP. Nces.ed.gov. [https://nces.ed.gov/nationsreportcard/guides/]
International Baccalaureate Organization. (n.d.). IBO. https://www.ibo.org/about-the-ib/. Accessed on December 17, 2025.
Klein, M., Sosu, E. M., & Dare, S. (2020). Mapping inequalities in school attendance: The relationship between dimensions of socioeconomic status and forms of school absence. Children and Youth Services Review, 118(118), 105432. https://doi.org/10.1016/j.childyouth.2020.105432
- Alt Link: https://www.sciencedirect.com/science/article/pii/S0190740920303698
O’Connor, Ericka. (2019, March 22). Orange You Accessible? A Mini Case Study on Color Ratio - Bounteous. www.bounteous.com. https://www.bounteous.com/insights/2019/03/22/orange-you-accessible-mini-case-study-color-ratio/
Organisation for Economic Co-operation and Development (OECD). (n.d.). PISA Frequently Asked Questions (FAQs). OECD. Retrieved March 8, 2026, from https://www.oecd.org/en/about/programmes/pisa/pisa-frequently-asked-questions-faqs.html
Rogers, J., & Jonker, A. (2024, October 4). What is data bias? IBM. https://www.ibm.com/think/topics/data-bias
Siebert, Horst (2001). Der Kobra-Effekt. Wie man Irrwege der Wirtschaftspolitik vermeidet (in German). Munich: Deutsche Verlags-Anstalt. ISBN 3-421-05562-9.
U.S. Department of Education, Office of Career, Technical, and Adult Education. (2025, January 15). Performance measures and accountability. https://www.ed.gov/about/ed-offices/octae/performance-measures-and-accountability