Punch Cards Are for Counting Votes

25 Feb, 2026

The previous posts in this series established something uncomfortable: AI systems don't need complete data, and 35% of the right signals can be sufficient for accurate recognition. If that's true, it raises a harder historical question — why did we spend sixty years insisting otherwise? The answer: Garbage In Garbage Out (GIGO).

Where GIGO Came From

Garbage In, Garbage Out was coined around 1957 to 1961, most commonly attributed to George Fuechsel, an IBM programmer and instructor. The context was punch card batch processing — a world where programs were submitted as physical decks of cards, fed into a machine overnight, and retrieved the next morning. If your cards had errors, your output was wrong. The machine did exactly what it was told. Nothing more.

But punch cards didn't start with IBM. Herman Hollerith invented the punched card system for the 1890 US Census — a tabulation problem, not a computation problem. Count the people. Record the counts. Produce the totals. The entire paradigm was built around enumeration: discrete inputs producing discrete outputs, no inference required or permitted. When punch cards migrated into corporate computing decades later, they carried that same assumption. A hole was either punched or it wasn't, and the difference was total.

GIGO was not a theory of data quality. It was an explanation for non-technical people about why deterministic computers produced wrong answers from bad input. It assumed a specific and narrow paradigm: one input, one output, no inference in between. The machine could not guess. It could not generalize. It could not reconstruct meaning from partial signal.

That paradigm held for decades. Mainframes, relational databases, ERP systems, compliance reporting — all of it deterministic. All of it appropriately governed by GIGO. The principle was correct for the machines it described.

What the Data Quality Movement Built on Top

By the 1990s and 2000s, GIGO had evolved from a programmer's quip into a management philosophy. Thomas Redman published directly in MIT Sloan Management Review, formalizing it for a generation of data leaders: missing fields were failures, completeness was the standard, and the CDO role that emerged from the 2008 financial crisis was built almost entirely on the defensive logic that bad data produces bad outcomes.

That logic was sound. For the systems it was applied to — structured reporting, regulatory compliance, financial auditing — completeness is genuinely load-bearing. A missing income field in an IPEDS submission is not an inference problem. It is a compliance failure with real consequences.

The problem is not that GIGO was wrong. The problem is that it was never updated when the underlying paradigm changed.

What Changed in 2017

The publication of Attention Is All You Need by Vaswani et al. in 2017 introduced the transformer architecture that underlies every major LLM in production today. The fundamental shift was this: where deterministic systems map inputs to outputs, transformer-based models predict probability distributions over possible continuations of incomplete sequences.

The machine changed. The data standard did not.

A transformer is not asking whether your data is complete. It is asking whether your data contains sufficient signal density to form reliable probability distributions. Those are different questions with different answers, and institutions still measuring AI readiness against completeness benchmarks are applying a 1961 punch card standard to a 2024 inference engine.

Where GIGO Still Holds

To stay rigorous: GIGO remains entirely valid for deterministic systems. Compliance reporting, financial audits, structured regulatory data, IPEDS submissions, Title IV eligibility calculations — all of it runs on deterministic logic where a missing field is a genuine failure, not an inference opportunity. The CDO who abandons completeness discipline for these systems in the name of AI readiness will have a very bad time.

The real challenge is that most institutions now run both kinds of systems simultaneously. Deterministic pipelines feeding compliance outputs sit alongside probabilistic inference engines supporting advising, enrollment, and student success. They have different data requirements, different quality standards, and different definitions of what good enough means. GIGO governs one. Signal density governs the other. Conflating them is the mistake.

The Album Cover

This series started with an AI generating a fake Kenny Loggins album cover that matched the real one at roughly 35%. The model had never seen a complete catalog of Loggins artwork. It had no field-level completeness on his discography. It had signal density — enough statistical pattern from enough distributed sources to approximate the category accurately.

That is not Garbage In, Garbage Out. That is probabilistic inference doing exactly what it was designed to do. The experiment wasn't anomalous. It was the system working correctly under a standard that GIGO was never built to measure.

Why This Matters for Your Institution

The CDO role was architected around a defensive logic that made sense in 2010: clean the data, close the gaps, enforce the standards, and good decisions will follow. That logic is not wrong. It is incomplete.

The next iteration of the CDO role requires holding two data philosophies simultaneously — completeness discipline for deterministic systems, signal density awareness for probabilistic ones — and knowing which standard applies to which decision. GIGO is not obsolete. It is misapplied. And the difference between those two things is the whole problem.

Further Reading

What Kenny Loggins Taught Me About Data Sufficiency →

It Was Never Whole →

Garbage In, Garbage Out → (Wikipedia)

An IBM 029 Card Punch machine