Every / an AI capability journey

How AI moved from a chat window into the shape of work.

A field guide to the archive of the media company and product studio I write inside: eight turns from the first strange conversations with GPT to agents, compound engineering, and a new argument about what remains human-owned.

Working Overtime follows my changing relationship with AI. This page follows the wider shift Every has been recording and building through, over time.

Scroll the arc ↓

Eight turns / late 2022 to now

From conversation to infrastructure.

In three and a half years, AI escaped the chat box and entered the work itself.

01 Late 2022 - early 2023

Conversation

First, AI becomes a place to think out loud.

The early archive approaches GPT as a conversation partner: a journal, a sounding board, a machine that becomes interesting because talking to it reveals something about your own thoughts.

GPT-3 Is the Best Journal You've Ever Used

02 Spring 2023

Context

Then the chat box becomes a reasoning engine.

GPT-4 shifts the center of gravity. The question is no longer only what a model knows. It is what it can reason through when people give it the right context.

GPT-4: A Copilot for the Mind

03 2024

Management

Knowing gives way to allocating.

As AI takes on more cognitive production, the human work begins to move upward: choosing problems, setting direction, supplying context, evaluating results, and deciding what deserves effort.

The Allocation Economy

04 Late 2024 - early 2025

Action

Models stop answering and start doing.

Agents, computer use, and deep research expand the unit of work. A prompt can now launch investigation or action, and Every begins testing not merely answers but delegated tasks.

Vibe Check Source Code

05 Spring - summer 2025

Creation

Building software becomes a conversation too.

Coding agents make creation available to people who were not previously programmers, while experienced engineers begin opening AI-written pull requests. The archive follows both the possibility and the cost.

I Rebuilt Sparkle in 14 Days With AI Working Overtime

06 Late 2025

Learning systems

The work starts teaching the agent back.

Useful AI development becomes less about one lucky prompt and more about feedback loops: failures become instructions, instructions become skills, and each run can improve the next.

Source Code Compound Engineering

07 Winter 2025 - spring 2026

Architecture

Agent-native becomes a way to arrange the system.

By early 2026, the agent is no longer an add-on to existing software or work. Context, tools, memory, and organization begin to be designed around what an agent can do.

Agent-native Guides Four AI Agents at Every

08 April 2026 - now

Human judgment

The story returns to the person in the middle.

Once agents can carry more execution, the central question becomes what people supply at the beginning and the end: intention, taste, accountability, and a reason the work should exist.

After Automation After 'After Automation' The AI Sandwich

Vibe Check / January 2025 to now

How capability changed, test by test.

Vibe Check is Every's running test of new models and AI products. Read in order, its reviews show the move from agents that browse to models judged on software engineering, writing, and professional work.

Jan - Feb 2025

Agents browse and research.

Operator carries out browser tasks. Deep Research searches across sources and returns a finished report.

Operator Deep Research

May 2025

Code and editing become real-use tests.

Codex is tested on an existing product codebase. Claude 4 Opus is tested on pull requests, research, and editing.

Codex Claude 4 Opus

Sep - Dec 2025

Long-running professional work enters the test.

Sonnet 4.5 is evaluated on operational documents and long-context work. GPT-5.2 is evaluated on extended analysis.

Claude Sonnet 4.5 GPT-5.2

Feb - Apr 2026

Engineering performance gets measured.

Opus 4.6 is tested on difficult coding tasks. GPT-5.5 is tested on Every's Senior Engineer Benchmark.

Opus 4.6 GPT-5.5

May 2026

One model leads on code and writing.

Opus 4.8 tops Every's Senior Engineer benchmark and writing tests in the same review.

Opus 4.8

Read all Vibe Checks

Find a starting point

Get a practical route through Every.

Choose what you want to do. Get an article, a guide, and a tool or working example to start with.

Recommended for you

Five doors in

Different reasons to read. One ecosystem.

Track the frontier

Tests and interpretations of newly released models and tools.

Vibe Check

Use it at work

Guides that turn a capability into a repeatable workflow.

Guides

See what gets built

Products shaped by the problems the editorial work uncovers.

Cora

Study the system

How agents and small teams actually organize work together.

Source Code

Keep the stakes visible

Arguments about automation, judgment, and the people inside the loop.

Context Window