I’ve spent a good part of 2024 in the trenches building Graph, a new agent-based product for engineering teams that leverages generative AI.
Feels good to be back in full-tilt startup mode: Top-notch team, learning new tech, building with the customer! 🚀
With the beta now public, I took a pause this week to reflect on what I’ve learned through this process: Building an AI-first product has challenged my almost two decades of product building!
The key insights for product people building AI-first products and designing prompt systems:
Prompts are the new PRDs — editing is more crucial than ever.
Embrace the intrinsic variability of AI, rather than wrestling it.
A product that talks back requires a rethink of user research.
⚠️ Quick note: AI is a tool. The hard product work remains — understanding the user, nailing the job to be done, and bringing the solution to market. AI is not a free pass. Don’t be lazy. The PM fundamentals are still crucial, but the playbook is totally different when building on top of AI.
Let’s dive in.
1. Creation ➡️ Curation
Great product work has always demanded excellent editing. The product manager’s role is to deliver focus, and editing is the primary instrument: Surgically dissecting a product, a project, a PRD, a problem to get to the heart.
I believe more in the scissors than I do in the pencil. —Truman Capote
But when building a product around generative AI, the scissors are almost the entire product development process. The PM is never starting from nothing, thanks to the pre-trained language models. You move from author to editor immediately.
From blueprints to breadcrumbs: Prompts are the new PRDs.
Instead of requirements and user stories, it’s iterating to crystallize what a good result looks like, and then curating and directing the AI to understand that, and ultimately, take action through tools (i.e. what an LLM “uses to interact with the world”).
As someone with an education, a background, and a deep passion for words (hence Words on Product), this was a new challenge for me: The art of clear communication…with machines! When you’re building around generative AI, a few thoughtful words in a prompt become more powerful than a 5-page product requirements document.
When designing prompt systems, Product teams now need to distill complex ideas into clear, concise instructions that find the right balance between specificity and flexibility so AI can leverage its capabilities while still guiding it towards the desired outcome.
Applying a writer’s mindset to prompt engineering requires extreme clarity of thought and deep understanding of the user’s JTBD (job to be done). Prompt engineering is a creative act!
👉 Try: If you’ve worked with prompt engineering at all, this will resonate — Start thinking in much tighter loops. It’s experimentation to the absolute max. Quantifying the impact of prompt changes is the real work here (as much as one can quantify in a non-deterministic system). Read about how Gong uses Elo — We’ve had a blast learning to build quantitative checks and balances. Check out the fine folks at LangChain, and if you’re looking for a good educational resource, Anthropic’s Prompt Engineering Tutorial is fantastic.
2. Embrace Variability
In traditional software development, the goal is often predictability and consistency. Every input is expected to produce a specific, predetermined output.
But working with generative AI means embracing a new paradigm where variability is unavoidable. Generative AI models, by their very nature, introduce an element of unpredictability. Given the same prompt, they may produce slightly different outputs each time.
😅 As a control freak, I had to stop wrestling AI and lean into variability. Here’s how:
Reframed variability as a strength! Started thinking about where inconsistency would be delightful, and create a more engaging experience. When that’s paired with transparency for the user, it can feel magical. Variability is human.
Identified where an LLM is just simply the wrong tool — When a use case or JTBD demands accuracy or consistency, lean on good old fashioned architecture or systems design. Often with AI, less is more.
This shift in mindset opens up some wild possibilities. By embracing variability, we can create more dynamic, personalized experiences that adapt to user needs and contexts. It also presents a challenge around designing systems that can handle a range of outputs while still honoring accuracy. The key is to find the sweet spot between leveraging AI's creative potential and maintaining a coherent user experience. And that starts with the user’s job to be done.
👉 Try: Turn the perceived bug (variability) into a strength for your product experience for users —
Set clear expectations. Communicate that AI outputs may vary, leading to personalized solutions.
Implement feedback loops. Allow users to rate and grade AI outputs for continuous improvement.
Prioritize trust. Even early adopters are skeptical right now. Build your experience around trust (accuracy + relevance + transparency).
3. Rethinking Research and UX
If you were working in Product around 2010 or so, this feels like the start of the mobile wave: A new technology unlocked a whole world where end users were still learning, UX patterns were not codified, and teams had to rapidly adapt.
This time, it’s designing for AI-first experiences, and it’s all about how humans and AI collaborate. It’s largely conversational because dialogue is where collaboration happens.
Think back to previous products and features you’ve built — How have you gotten early stage, evaluative feedback from users? Often, the choice is between designing a mocked up flow vs building a quick version. While there are always huge benefits to the latter, former is usually faster and cheaper.
But when you’re building with LLMs, how do you put together evaluative research when the product experience is intrinsically variable and the AI-first UX is conversational? 🤯
And, as you know, speed and cost aren’t the only considerations when it comes to reducing risk in what you build. You often need to be testing feasibility in the same breath.
Throughout our feedback loops on this new product, I found myself going back in time to the challenges that we faced during my time working on HipChat at Atlassian (ancient history, back in 2014!) — evaluative research in a chat-based experience that’s dependent on a UX that moves, breathes, interacts. Funny enough, ten years later and I’m pulling from those same learnings in different ways.
So — How do you test a product that talks back? In the words of our Researcher, Chelsea Davis:
Conversational AI is about researching how people communicate. Everyone communicates and searches for information differently. You’re after the question behind the question, then take that and give the product the context and framing it needs to get to the result sooner for the user, reducing friction. And it starts anticipating what the user needs. You have to be a student of human communication.
As a result, we’ve chosen mocked up flows only sparingly and only for specific, nuanced interaction testing or research that’s so early it feels more generative than evaluative. More than not, we’ve opted to simply build because:
Presenting users with a pre-defined, ideal workflow mockup introduces zero variability and doesn’t allow for the user interaction or conversation, which means we miss powerful insights.
Feasibility in the context of “how might we” has been just as important, since we’re working with new technology.
Iterating is fast, and we have the right team that can move with speed for the sake of the user.
By building and giving the user a set of crayons paired with hints, we have learned much faster. Use little nudges, wayfinders, example prompts, suggestions, or setting the stage with phrases like, “Let’s pretend it’s Tuesday before your Engineering Meeting…” to help the user pick up the crayons.
The goal is to understand the types of questions your users have, and then anticipate the action the agent should be taking for that JTBD.
👉 Tip: Take a step and learn some of the UX patterns that are emerging on sites like Shape of AI. Create a basic UI and have a team member act as the AI, responding to user inputs in real-time. This approach allows you to simulate variability, test conversational flows, and gather rich qualitative feedback without the need for a fully functional AI system. Check out Wizard of Oz testing.
Here are a few things that we watch for in testing:
How people communicate directly with the LLM — how they type, what context do they assume the tool has, etc.
How people react with their body language when the LLM assumes incorrectly — do they try again? Use different words? All of those reactions go back into training.
How people build trust in an LLM — What types of accuracy, relevancy, and transparency play out in the user experience? Do trust symbols like references get old message after message?
Just Do It
Every single day during this process has brought a new wave of learnings, and challenged how I work as a product person. Honestly, it’s revealed a pure form of the product process: JTBD + Editing. It’s channeled the core competencies — adapting, prioritizing, empathizing — but in new ways!
A call to action specifically for senior product leaders: Learning by doing is simply the best. There is no substitute for diving in and doing the work. And, while I don’t know where AI will be in five years, I do know that what we have now is not all it will be. Now is the time to lean in. You cannot learn by watching or directing your team. You must do to learn. It’s moving too quickly. 🚀
Bottom line: Traditional product development methods need significant reimagining for AI products, with emphasis on prompt engineering, accepting AI's variable nature, and adapting user research for conversational interfaces. Ironically, it feels like a more pure form of product, pulling mightily on JTBD!