Benjamin Phillips | Product Manager & Technologist

I've been building side projects with LLMs for about a year now. Some went well. Most taught me something the hard way. Here's what I've learned.

Lesson 1: Start with the prompt, not the code

The biggest mistake I made early on was jumping straight into API calls. The real work is figuring out what you actually want the model to do. Spend time crafting and testing prompts in the playground before writing a single line of application code.

Lesson 2: Claude and GPT have different strengths

After extensive use of both:

Claude excels at following complex instructions, maintaining context over long conversations, and writing clean code. It's my go-to for coding tasks.
GPT-4 is stronger at creative tasks, has broader world knowledge, and handles multi-modal inputs well.

Pick the right tool for the job instead of defaulting to one.

Lesson 3: Structured output saves hours

Getting LLMs to return JSON or other structured formats reliably was a game-changer. Use function calling or explicit schema instructions. Don't try to parse free-text responses with regex—you'll lose that fight.

Lesson 4: Evals matter more than vibes

"It seems to work" is not a testing strategy. Build a set of test cases, run them systematically, and track scores over time. This is especially important when you change prompts—what improves one case often breaks another.

Lesson 5: Costs add up fast

Token usage is easy to ignore during development and painful to discover in production. Monitor your usage from day one, cache aggressively, and consider whether you really need GPT-4 or if a smaller model would suffice.

What's next