“Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.” — Often attributed to Einstein
The academic world loves metrics. Publications per year. H-index at tenure. Median time to PhD. Grant dollars secured. We aggregate, average, and rank, believing these numbers tell us something meaningful about success in mathematics. But what if the very act of aggregation is lying to us? What if, like fish being ranked on tree-climbing ability, we’re measuring ourselves against the wrong game entirely?
The Aggregation Illusion and Its Discontents
Consider a simple question: What does a successful mathematical career look like? The standard answer involves aggregated statistics; the average mathematician publishes X papers, gets tenure after Y years, supervises Z doctoral students. These numbers feel objective, scientific, meaningful. They’re also deeply misleading.
The aggregation illusion occurs when we combine data across different contexts and treat the resulting average as meaningful for individuals. But mathematics isn’t a monolithic discipline. The “average mathematician” is a statistical phantom, created by mashing together the applied mathematician who publishes six computational papers annually with the algebraic geometer who spends four years on a single breakthrough. The resulting average, say, 2.5 papers per year, describes precisely no one and misleads everyone.
This isn’t just about different fields having different norms. Even within a single department, aggregation masks crucial variation. The department might report that faculty average three publications annually, but this “average” emerges from wildly different realities: collaborative researchers in mathematical biology producing numerous short papers, versus solitary number theorists crafting rare monuments. Using this average as a benchmark is like telling fish and birds they should both aim for “average locomotion speed,” the metric assumes a shared context that doesn’t exist.
When Simpson’s Paradox Enters the Picture
The problem gets worse. Sometimes aggregation doesn’t just mislead, it completely reverses the truth. This is Simpson’s Paradox, where a trend present in every subgroup disappears or reverses when the groups are combined.
Imagine a mathematics department examining their graduate admissions. Overall, they admit 40% of male applicants but only 35% of female applicants, apparent bias. But dig deeper:
- In Analysis: 30% of male applicants admitted, 35% of female applicants admitted
- In Algebra: 60% of male applicants admitted, 65% of female applicants admitted
In each subfield, female applicants have higher admission rates. The overall reversal happens because women disproportionately apply to the more competitive analysis program. The aggregated data tells the opposite story from the granular reality.
This paradox appears throughout mathematical careers. A mathematician might have lower average citations than their peers but consistently higher impact within their specific research community. Papers in combinatorics accumulate citations quickly through applications, while breakthrough work in category theory might take a decade to be appreciated. The aggregate comparison suggests underperformance where none exists.
Why Average Outcomes Systematically Mislead
The distribution of mathematical careers is highly non-normal, making averages particularly useless. Consider the job market: the “average” math PhD might get a tenure-track position at a mid-tier research university. But this average emerges from a bimodal reality, graduates either land research positions at R1 universities or leave academia entirely, with relatively few in the supposed “middle.”
Andrew Wiles spent seven years in mostly isolated work on Fermat’s Last Theorem. By any aggregated productivity metric, he was failing. Limited papers. Limited grants. Limited visible output. Yet this represents one of the greatest mathematical achievements of the 20th century. The average would have told him to change course.
This isn’t just about exceptional cases. The entire structure of mathematical research resists meaningful aggregation. A “productive” researcher in PDEs might thrive on collaboration and incremental advances. A “productive” researcher in algebraic number theory might work alone on foundational problems. Averaging across these modes is like averaging fish swimming speeds with bird flying speeds, the number you get is mathematically valid but meaningfully useless.
Context Isn’t Just Background, It’s Everything
General patterns that seem robust often crumble at the contextual level. “Collaboration increases research output” sounds like good advice until you realize that in certain areas of pure mathematics, the deepest work emerges from solitary contemplation. “Apply for grants early and often” makes sense in computational fields but might be irrelevant in areas requiring only pencil and paper.
The context that matters isn’t just your mathematical field but your entire professional ecosystem:
- Teaching load (2-2 vs. 4-4 fundamentally changes research possibilities)
- Institutional resources (graduate students, travel funding, library access)
- Career stage relative to your specific area’s norms
- Personal constraints (geography, caregiving, health)
- Chosen trade-offs (depth vs. quantity, mentoring vs. personal output)
A mathematician at a liberal arts college with a 3-3 teaching load, focusing on undergraduate mentoring, belongs to a completely different reference class than someone at the Institute for Advanced Study, even if they work in the same mathematical area. Comparing their publication rates is exactly like ranking fish by their tree-climbing: not just unfair, but absurd.
Finding Your True Reference Class
So how do we escape the aggregation illusion? By finding our true reference class, the group of people actually playing the same game with similar constraints and opportunities. Only within this properly defined reference class do metrics become meaningful rather than misleading.
Don’t rank fish by their tree-climbing. Find your water before measuring your speed.
Here’s how to identify your true reference class:
The Constraint Inventory
List your non-negotiable constraints:
- What’s your actual teaching load?
- What resources do you genuinely have access to?
- What are your immovable personal commitments?
Anyone without similar constraints isn’t in your reference class, regardless of job title similarities.
The “Day in the Life” Test
Find people whose actual daily work resembles yours. A theoretician spending mornings on solo proof work has a different reference class than one managing a computational lab, even if they’re in the same department. If your days look different, your metrics should too.
The Opportunity Cost Question
Ask: “What am I explicitly choosing instead of conventional success metrics?”
- Deep problems over quick publications?
- Mentorship over personal research?
- Work-life balance over total output?
Others making similar trade-offs form your true peer group. You’re not “behind” them, you’re in a different race entirely.
The Success Story Audit
Look for people you admire who’ve succeeded from positions similar to yours. If you can’t find any, you’re probably using the wrong reference class. A pure mathematician shouldn’t look for role models among applied mathematicians, regardless of what departmental averages suggest.
The Substitution Test
Ask: “If someone with my exact profile swapped places with my comparison person, would they be expected to produce similar outcomes?” If the answer is no, you’re not in the same reference class. This test quickly reveals when supposedly similar positions are actually incomparable.
The Time Horizon Check
Different reference classes operate on different timescales. Fast experimental fields publish quickly; slow theoretical fields don’t. Early career hustle differs from late career consolidation. Make sure you’re comparing over appropriate time windows, a five-year publication record might be meaningful in one field and premature in another.
Why Metrics Work Once You Find Your League
Here’s the counterintuitive truth: metrics become genuinely useful once properly contextualized. Within your true reference class, you can meaningfully assess:
- Whether you’re improving relative to your past trajectory
- How your work compares to others facing similar constraints
- What realistic next steps look like given your specific situation
- Whether your struggles are personal or structural to your context
For instance, a mathematician at a teaching-focused institution can meaningfully compare their research output to others with 4-4 teaching loads, not to someone with a 1-1 load. Suddenly, one paper per year looks productive rather than inadequate. The metrics haven’t changed, the context has made them interpretable.
The key insight: once you’re comparing yourself to people actually playing the same game, metrics transform from sources of anxiety into useful guides. Publication counts, citation metrics, grant success rates, all become informative when restricted to your true peers. You can identify patterns, spot opportunities, and make strategic decisions based on what actually works in your context.
The Liberation in Proper Comparison
Finding your reference class isn’t about lowering standards or making excuses. It’s about measuring what matters in the game you’re actually playing. A fish isn’t a failed squirrel, it’s an excellent fish. Similarly, you’re not a failed version of someone with completely different constraints, you’re potentially excelling within your actual context.
The practical process is straightforward:
- Start narrow: Define your reference class more specifically than feels reasonable (algebraic geometers at liberal arts colleges with young children and no spousal support)
- Test the boundaries: Gradually expand until you find where meaningful comparison breaks down (maybe include some R2 universities but not R1s)
- Lock and measure: Commit to this reference class for at least a year. Now metrics within this class become meaningful guides.
The ultimate test: Does comparing yourself to this group give you actionable insights rather than just anxiety or false confidence? The right reference class should make your successes feel earned and your failures feel instructive, not inevitable.
Remember the modified Einstein: Don’t rank fish by their tree-climbing. The mathematical community is full of different species thriving in different environments. Your job isn’t to climb trees faster, it’s to find your water, find the other fish, and then swim like hell. Only then do the measurements mean anything at all.
The aggregation illusion makes us think there’s a single game being played, when mathematical research is actually thousands of different games with different rules, rewards, and rhythms. Stop believing the average. Find your league. Then, and only then, check the score.