Conversation
First, AI becomes a place to think out loud.
The early archive approaches GPT as a conversation partner: a journal, a sounding board, a machine that becomes interesting because talking to it reveals something about your own thoughts.
Every / an AI capability journey
A field guide to the archive of the media company and product studio I write inside: eight turns from the first strange conversations with GPT to agents, compound engineering, and a new argument about what remains human-owned.
Working Overtime follows my changing relationship with AI. This page follows the wider shift Every has been recording and building through, over time.
Scroll the arc ↓Eight turns / late 2022 to now
In three and a half years, AI escaped the chat box and entered the work itself.
Conversation
The early archive approaches GPT as a conversation partner: a journal, a sounding board, a machine that becomes interesting because talking to it reveals something about your own thoughts.
Context
GPT-4 shifts the center of gravity. The question is no longer only what a model knows. It is what it can reason through when people give it the right context.
Management
As AI takes on more cognitive production, the human work begins to move upward: choosing problems, setting direction, supplying context, evaluating results, and deciding what deserves effort.
Action
Agents, computer use, and deep research expand the unit of work. A prompt can now launch investigation or action, and Every begins testing not merely answers but delegated tasks.
Creation
Coding agents make creation available to people who were not previously programmers, while experienced engineers begin opening AI-written pull requests. The archive follows both the possibility and the cost.
Learning systems
Useful AI development becomes less about one lucky prompt and more about feedback loops: failures become instructions, instructions become skills, and each run can improve the next.
Architecture
By early 2026, the agent is no longer an add-on to existing software or work. Context, tools, memory, and organization begin to be designed around what an agent can do.
Human judgment
Once agents can carry more execution, the central question becomes what people supply at the beginning and the end: intention, taste, accountability, and a reason the work should exist.
Vibe Check / January 2025 to now
Vibe Check is Every's running test of new models and AI products. Read in order, its reviews show the move from agents that browse to models judged on software engineering, writing, and professional work.
Jan - Feb 2025
Operator carries out browser tasks. Deep Research searches across sources and returns a finished report.
May 2025
Codex is tested on an existing product codebase. Claude 4 Opus is tested on pull requests, research, and editing.
Sep - Dec 2025
Sonnet 4.5 is evaluated on operational documents and long-context work. GPT-5.2 is evaluated on extended analysis.
Feb - Apr 2026
May 2026
Opus 4.8 tops Every's Senior Engineer benchmark and writing tests in the same review.
Find a starting point
Choose what you want to do. Get an article, a guide, and a tool or working example to start with.
Recommended for you
Five doors in
Tests and interpretations of newly released models and tools.
Vibe CheckGuides that turn a capability into a repeatable workflow.
GuidesProducts shaped by the problems the editorial work uncovers.
CoraHow agents and small teams actually organize work together.
Source CodeArguments about automation, judgment, and the people inside the loop.
Context Window