Where the Difficulty Went – Navigating Proof Space

A few weeks ago I had what I thought was a good idea.

There’s a problem in harmonic analysis I’d been turning over for a while; the technical details don’t matter much for the story, but the shape of it does. An existing proof in the literature worked but didn’t feel optimal to me, and I thought I saw a cleaner route: take a certain object in the argument, treat it as a positive operator, and recast a hard one-weight question as a more tractable two-weight one, where good dyadic techniques already exist. I was fairly sure it would go through. I outlined the whole thing for my graduate student.

Then I learned that another mathematician had had essentially the same idea, roughly a week before I did.

That part didn’t surprise me at all. A good approach is often in the air, natural enough that several people standing near the same problem reach for the same handhold within weeks of each other. I want to be generous about this from the start: the other mathematician saw the same natural path I saw. We were two people looking at the same landscape.

Here’s the part I’m still thinking about. I outlined my idea for a student. He outlined his for ChatGPT, and, with the more capable paid tier, got back something that looked like a more-or-less correct proof. Not a finished paper; a draft that would need fleshing out and verifying, using several of the same moves I’d worked out in my head. So we both had the idea. The difference was what happened next. For him, idea-to-draft was a single session. For me, by the old reckoning, it would have been weeks.

That compression is the thing worth writing about.

It’s tempting to read this as a story about being beaten. I don’t think that’s right, and getting the reading right matters, because the wrong one leads to the wrong lessons. We had the same idea, independently, at full quality. The tool didn’t generate the idea. The scarce resource in the whole story, the thing neither of us could outsource, was the idea. That didn’t compress at all. What compressed was everything downstream: the technical execution, the careful writing-out, the weeks in the weeds turning a believed-true sketch into a verifiable argument.

This isn’t the first time a moat like this has drained. When computer algebra systems matured, a class of “carry out this computation by hand” results stopped being theses. SAT solvers, numerical PDE, even just LaTeX and a searchable literature, each drained some execution moat, and each time the profession re-valued upward, toward questions where the hard part was knowing what to ask and whether the answer was right. What’s different now is speed and breadth: many moats draining at once, fast enough to feel inside a single career rather than across a generation.

The practical update is almost embarrassingly simple, and I’d been slow to internalize it: if I hold most of the idea, I should feed it to the tool, with full context, the papers, my notes, the lemmas I expect to need, to get the draft. Not to think for me; to execute the part that is now executable. I’d been treating the writing-out as inseparable from the thinking, because for my whole career it had been. This story pried them apart. (There’s a tooling footnote, yes, the pricier tiers have larger context windows and stronger reasoning, and against an academic salary the price gap is noise; the only way to know if it helps your workflow is to run one real problem through it for a month rather than theorize from the armchair.)

What unsettles me more than any subscription question is this: it was a reasonable research problem, the kind that could anchor part of a thesis, and with the idea in hand it fell in a session. That moves the bar for what counts as a worthwhile problem, but only along one axis. Such a problem used to be hard because you needed a good idea and weeks of execution not everyone could pull off. The execution axis drained. The questions that hold their value are the ones where the idea is the bottleneck: where the right question isn’t obvious, where the result is a new framework rather than a computation inside an existing one.

If you’re a graduate student reading this with your stomach dropping, that last sentence is the move: orient toward problems where the hard part is the question, and treat your developing taste, not your execution speed, as the thing you’re actually in school to build. And if you’re a postdoc, I won’t pretend the bind away: you’re being judged right now, on the market, by the old metrics of output and technical difficulty, even as the ground shifts to reward something else. That tension is real and it’s unfair, and the people who navigate it best will be the ones who can do both, produce on the current terms while visibly building the next-generation skill.

Which brings me to the part I find hardest. I can’t give my students the advice I was given. Part of what made a good thesis problem good was that it was hard enough to occupy a capable person for an extended period of time, and that property is now, sometimes, exactly what’s automatable. The reflex is to hunt for ever-harder execution problems to keep students busy. I think that runs the wrong direction. My working hypothesis, and it is a hypothesis, is that we now have to teach explicitly what used to develop implicitly through grinding: taste, the generation of good questions, the judgment to catch a slick proof that’s quietly wrong. You used to acquire those because you spent six weeks in the weeds and came to know the terrain; when the six weeks compress, that terrain-knowledge doesn’t accumulate the same way. In practice, for me, that’s starting to look like three things I can actually assign: have a student generate questions and practice telling which are worth pursuing; hand them an AI-generated proof with a planted error and have them find it; and make them do the idea-generation under conditions where the tool can’t shortcut it, so the muscle develops. The student who can find the error in a proof that looks right is doing exactly the thing the machine can’t do for itself.

And “stay current with the tools” sounds like a treadmill. If it means tracking every model release, it is one, and the wrong target. The durable skill is a calibrated sense of the boundary: for a given task, knowing whether the tool will help, mislead, or fail, and updating that sense as the tools move. By that definition this whole episode was fluency working, not failing. I was surprised; I noticed; I extracted the lesson and changed how I work. That loop, get surprised, locate the boundary that moved, update, is the skill. The discomfort of feeling not-quite-current isn’t evidence you’re behind. It’s what staying calibrated feels like from the inside.

Two people saw the same path up the same mountain. One climbed it by hand; the other took a faster route up the same face. Neither found the path for the other, that still had to come from a person at the bottom, looking up, knowing which way was worth going. That part hasn’t changed. Everything else, apparently, has.