AI Coworker, v0.1

February 2025

In 2025, AI agents are going to take off.

Deepseek was the first product that showed its "thinking". They didn't expose the model's chain-of-thought, but the average user doesn't care. Talking to my friends, I heard words like "cute" and "trustworthy" for the first time. A simple UX decision made an AI model from China feel more transparent than our homegrown models.

What happens when the model needs to think for 10 minutes or 2 hours?
Again, model interpretability is key.

OpenAI's new Deep Research tool is impressive. It was essential to show its thinking steps to build my trust with the product. Now, I rarely check the intermediate steps.

But that's the key. I've started to treat Deep Research like it's a coworker.

There is a missing component, the validation loop is not efficient. I either choose to trust the outputs or I must check the "facts" it produces to ensure correctness. How do we solve this?

Pick problems with closed-loop validation systems.
Build a user experience with checkpoints built-in.


Checkpoints allow users to visualize the steps the LLM took to create an output. The user can revert to an old checkpoint and correct any mistakes that were made. Closed validation loops ensure quick human verification, important for any economically viable work.

If it isn't obvious, AI code assistants are the first version of this new user experience. Checkpoints are already a familiar UX pattern and development benefits from extremely fast validation loops.

My official prediction: by the end of 2025, software engineering will be more about designing systems that LLMs can build, test, deploy, and maintain.

Humans are still needed. Build the system that makes the best professionals better.