I've been building side projects with LLMs for about a year now. Some went well. Most taught me something the hard way. Here's what I've learned.
Lesson 1: Start with the prompt, not the code
The biggest mistake I made early on was jumping straight into API calls. The real work is figuring out what you actually want the model to do. Spend time crafting and testing prompts in the playground before writing a single line of application code.
Lesson 2: Claude and GPT have different strengths
After extensive use of both:
- Claude excels at following complex instructions, maintaining context over long conversations, and writing clean code. It's my go-to for coding tasks.
- GPT-4 is stronger at creative tasks, has broader world knowledge, and handles multi-modal inputs well.
Pick the right tool for the job instead of defaulting to one.
Lesson 3: Structured output saves hours
Getting LLMs to return JSON or other structured formats reliably was a game-changer. Use function calling or explicit schema instructions. Don't try to parse free-text responses with regex—you'll lose that fight.
Lesson 4: Evals matter more than vibes
"It seems to work" is not a testing strategy. Build a set of test cases, run them systematically, and track scores over time. This is especially important when you change prompts—what improves one case often breaks another.
Lesson 5: Costs add up fast
Token usage is easy to ignore during development and painful to discover in production. Monitor your usage from day one, cache aggressively, and consider whether you really need GPT-4 or if a smaller model would suffice.
What's next
I'm currently experimenting with tool use and agentic patterns. The ability for models to call functions, browse the web, and chain actions together feels like the next big unlock. More on that soon.