Do or die, Generative AI
Just weeks after releasing an API for ChatGPT, OpenAI announced GPT-4. Meanwhile, everyone is rushing to integrate GPTs into their product. Here are 5 takeaways we learned.
For once, Microsoft actually beat everyone to it. Agreed, it did cost them a whopping ten billion dollars. But hey, at least Bing was the first search engine to offer such a fluent conversational interface by integrating OpenAI’s ChatGPT. There’ve been similar experiments before but this feels different. Just like Bing, other products like Notion, Slack and Instacart have been trying to put the power of Large Language Models into their products. Here are 5 takeaways from how they did it and what we can learn from it.
#1 Expect nothing and you’ll never be disappointed
Any good experience starts with setting the right expectations. This can be hard for a technology that’s mainly limited by your imagination. That’s why most of the integrations already have some first ideas for you and provide some suggestions at the start to get you going. This helps set the frame and also helps explain the purpose of the integration with ChatGPT.
Almost every integration’s value proposition focuses on saving time or generating quick ideas. It lets you do tasks that normally take a long time (summarise or rewrite a text) or helps you create new content based on your input. The ChatGPT app for Slack promises you can get up to speed faster on channels or threads through its conversation summaries. Notion not only stresses time-saving but also the creative part: “work faster, write better”.
Is it an assistant, a co-pilot or something else? Quizlet went for “tutor” which better suits their context of education. Snapchat’s My AI “can offer advice on the perfect gift for your BFF’s birthday, help plan a hiking trip for a long weekend, suggest what to make fo dinner”. Ok, makes sense. But they also promise to “connect you more deeply to the people and things you care about most”. Wait, what? That’s way too bold a claim for a large language model.
💡What to do: Properly introduce your AI-powered functionalities and explain to your users how this can be helpful to them in a very concrete way by providing examples.
#2 Push The Button
Integrating GPTs – for now – mostly requires pressing some sort of button. This has everything to do with the non-trivial cost of executing a prompt. Because of this, the border between your work and the AI’s work is still very clear. As a user, you still explicitly ask for assistance. It’s you doing something or the AI doing something, never together. The collaboration isn’t so deeply integrated yet that it feels like you’re writing together with an AI at the same time. Bing is hinting in that direction, calling itself your AI-powered copilot for the web. Although ‘the web’ in this case has a rather narrow definition though.
In some products, like Snapchat+, executing a prompt feels rather natural because it works just like chatting with a contact. The quick actions in Notion AI also feel natural, you barely notice the explicit action you have to make. In a tool like Slack, however, this explicit action does feel much more cumbersome and unnecessary. As the cost decreases I’m sure we’ll see a summary of a thread as soon as you open it, without having to explicitly ask for it.
Another reason why this isn’t happening is it still takes a lot of time to generate the output. No, it’s not a typewriter effect, it’s the actual time it takes to generate a response. And we’re speaking about several seconds, not milliseconds. That’s obviously not a great user experience and currently still provides a lot of friction. Most tools allow you to cancel or stop the current process explicitly. OpenAI currently has aptly named “Turbo” models that provide faster responses but it’s still rather slow compared to doing a simple Google search.
💡What to do: Experiment and try to find the best and most natural way to integrate a prompt for your product. Good interaction design can make the waiting time feel less annoying.
#3 Beyond the prompt
In work tools like HubSpot and Notion you’ll see more quick actions that hide the underlying prompt that goes to the ChatGPT API. As we’ll see these integrations grow further I’m sure a lot more underlying complexity will be hidden and provided through a simple and more intuitive UI.
As prompt engineering becomes a whole new field of expertise, prompts will also get more complex. We’ve seen the same thing with tools like Midjourney. To keep the experience user-friendly, you’ll want to hide most of the prompt’s complexity.
💡 What to do: Define the main use cases and make it easier for your users to execute them directly so they don’t need to type the whole prompt each time.
#4 Objects are closer than they appear
Let’s also address the elephant in the room: these models make mistakes. The question is not if but how you deal with failures. Since it’s impossible to imagine all the ways these tools will fail upfront, you need to make sure your users can signal you when something went wrong. The most basic approach here seems to have a thumbs up/thumbs down button. This takes little effort for the user but also doesn’t get you very far since you don’t know why. If you’re charging users per prompt, it’s also a good practice to refund them in case you failed. Jasper automatically refunds your credits if you give a thumbs down.
But there will also be moments when your model fails and your users won’t always realise. So how do you prepare users for this? Bing tries to manage the expectations from the start by saying “Bing is powered by AI, so surprises and mistakes are possible”. Their version of ChatGPT also allows them to provide sources for parts of the output so you can go and verify yourself. To me this is still the equivalent of car manufacturers putting “Objects in mirror are closer than they appear” on their mirrors. Notion AI is the prime example of this by putting under each result “Al responses can be inaccurate or misleading.” No shit, Sherlock! I think we can and should do better.
Bing already exposes some of ChatGPT’s underlying parameters by allowing users to select their conversation style. More creative or more precise? ChatGPT has a parameter called temperature that controls the randomness – and thus creativity – of the output. It would be good to experiment with your product which settings give the best results in your use case.
💡What to do: Provide an easy way for users to give feedback so you can follow up and better understand where your suggestions were helpful and where they’ve failed.
#5 #neverforget @TayandYou
So far I’ve been mostly positive about how ChatGPT is being integrated but I want to end with a few reasons why this is probably not a good idea in certain contexts. In Instacart’s demo, the user asks for healthy lunch options for kids and gets some suggestions. The answer is clear-cut and feels very confident because of the added explanation at the start. As a user, none of this tells you that this is not per se good advice as there is no explainability. But… this is you asking a language model about dietary advice? That’s a very dangerous idea, especially if you show the results without any feeling of doubt. Imagine the cost of being wrong when people search for allergies or other dietary restrictions and get the wrong advice. Instacart’s integration has yet to be released so I’m curious to see how they’ll mitigate this risk. Using “might” and “possibly” or starting the sentence with “I think” could already signal to a user they shouldn’t just take this as a fact.
Another troubling example is how Snapchat’s My AI seems to completely ignore its target audience - being young teens. Tristan Harris shared some conversations that will give any parent the chills. Like Tristan says, even if OpenAI or Snap Inc. fixes this specific issue (which seems like they did), they can’t screen for the whole range of possible conversations a 13-year-old may have. The way you deliver this output should include a different tone depending on the subject or underlying probability.
Quizlet’s Q-Chat takes a safer approach by allowing only specific commands that are appropriate for their learning environment. At the same time, their value prop makes it very clear what you can expect from this AI: it’s a tutor to help you study more efficiently. Another example is Shop where it seems they’ve put some pretty strong guardrails in to keep the power of ChatGPT on the leash to help you find products.
Shop’s AI shopping assistant also appears to be pretty sandboxed so you can’t go outside the provided use cases. For any question outside the shopping context, you get a clear “I'm sorry, I'm a shopping assistant and my expertise is in helping you find products. Is there anything you're looking for today?”.
To avoid pesky situations like Snapchat’s advice for teenagers in the future, the European Commission proposed its Artificial Intelligence Act (AI Act) to ensure that AI systems are safe and respect existing laws on fundamental rights and European Union values. This could be a step in the right direction from a regulatory point of view. At the same time, concepts like Model cards already allow companies to explain what’s going on inside the black box but haven’t really taken off yet.
💡What to do: Bring your output in a way that is appropriate to the context and according to the underlying probabilities of your output.
The way forward
Of all the GPT integrations, Bing so far feels the most well thought-through, with Notion AI being a close second. This has everything to do with how nicely ChatGPT’s potential is integrated into the product.
As services like ChatGPT get cheaper and faster, we’ll see more and more integrations where GPTs are completely absorbed into a product. It will no longer be “Notion AI” or “Your AI-powered copilot” but just Notion and Bing.
As product teams, we’ll also need to work much more on explainability. To avoid dangerous situations in which errors are taken for facts, we should use the underlying probabilities of the AI’s output to tailor how to bring the content to the user.
The excitement about AI by users will also start to wear off very quickly and product teams will need to really focus on added value, rather than just all boarding the hype train that is AI. That’s the era I’m most excited about. Ok, the AI winter is finally over but we’ll need to prove all these shiny new technologies are actually useful and solve our users’ needs. As product teams, we’ll need to make sure we bring actionable insights and provide support during the creative process so they can reach their full potential.
A friendly summary:
- Properly introduce your AI-powered functionalities and explain to your users how this can be helpful to them in a very concrete way by providing examples.
- Prototype and test with users to find the best and most natural way to integrate a prompt into your product. Good interaction design can make the waiting time feel less annoying.
- Define the main use cases and make it easier for your users to execute them directly so they don’t need to type the whole prompt each time.
- Provide an easy way for users to give feedback so you can follow up and better understand where your suggestions were helpful and where they’ve failed.
- Tailor your output’s content and tone of voice to the context of your users and adapt it according to the underlying probabilities.