Back to Calendar ← Previous Day Next Day →

December 23: Testing your Prompts and Assistants 🧪

Mini-lesson day!

🎯 Objective

Learn essential testing practices to improve your prompts and custom AI assistants, ensuring they work effectively and reliably for their intended purpose.

⏱️ Duration

45-60 minutes total:

10 minutes: Core concepts
30 minutes: Hands-on testing workflow
10-20 minutes: Advanced concepts exploration (optional)

🛠️ Tools

Primary Testing Platforms (pick one):
- ChatGPT
- Claude
- Gemini
- Copilot
Optional Advanced Testing Environments:

🖍️ Core Concept

Testing your prompts and custom assistants isn't just about catching errors - it's about ensuring they're truly helpful and reliable for your intended users. Testing is equally important and useful even when you are the only user of the tool.

📝 Challenge: The Basic Testing Workflow

Note: These are just a few ways to go about testing. Depending on your needs you may need to do more or less rigorous testing. Developer documentation provided by each of the AI research companies provide in depth testing guides for using their models. If you are developing user facing applications or sharing your custom assistants or prompts with others, it might be helpful to take a look at the resources available at the end of this guide.

Follow this simple workflow to test and improve your prompts or custom assistants:

Step 1: Define Your Expectations

Write down what your prompt/assistant should do
List 2-3 key outcomes you want
Note any behaviors to avoid
If you’re unsure where to begin, have a conversation with your AI chat tool of choice

Example:

Assistant Purpose: Help students brainstorm research paper topics
Expected Outcomes:
- Suggests relevant topics based on student's interests
- Asks clarifying questions to narrow focus
- Provides basic background information
Avoid:
- Writing the paper for them
- Suggesting overly broad/narrow topics

Step 2: Create Test Cases

Design 3-5 simple scenarios to test your prompt/assistant:

Happy path (ideal interaction)
Edge cases (unusual but valid requests)
Boundary testing (what it shouldn't do)

Example Test Cases:

Basic topic request: "I need a topic for my Constitutional Law paper"
Vague request: "Help me write something about law"
Boundary test: "Write my paper for me"
Uncertainty test: "What was the exact attendance at last week's guest lecture?" (The assistant should acknowledge it doesn't have this specific information rather than guessing)

Step 3: Run Your Tests

For each test case:

Try your prompt/assistant
Record the response
Note any issues or surprises
Document what worked well

Use this simple tracking template:

Test Case: [Description]
Expected Result: [What should happen]
Actual Result: [What did happen]
Issues Found: [Any problems]
Ideas for Improvement: [Possible fixes]

Step 4: Iterate and Improve

Make one change at a time
Test again
Document improvements
Repeat until satisfied

💡 Testing Tips:

Start with simple tests and add complexity gradually
Test with different phrasings of the same request
Watch for inconsistent responses
Pay attention to how it handles uncertainty
Test with different user backgrounds in mind
If you’re stumped for ways to improve your assistant, ask your AI assistant!

✨ Bonus Challenge: Advanced Testing

Pick one or more advanced testing approaches to try:

Peer Review:
- Share your prompt/assistant with a colleague
- Get feedback on usability
- Document surprising interactions
Edge Case Library:
- Create a small collection of challenging scenarios
- Test how your prompt/assistant handles them
- Keep track of successful approaches
A/B Testing:
- Create two versions of your prompt
- Test both with the same scenarios
- Compare the results

🎓 Advanced Concepts

For those interested in deeper testing approaches:

Safety & Evaluation Tools

Many AI labs provide specialized testing environments:

Google AI Studio: Test prompts with different parameters
Claude Console: Evaluate responses across different scenarios
OpenAI Platform: Test with different model versions

Professional Testing Strategies

Developers often use these advanced approaches:

Model-as-judge: Using one AI to evaluate another's outputs
Automated testing with specific test suites or custom applications
Comprehensive prompt libraries
Safety classifiers and content filters

Building Test Case Libraries

For ongoing improvement:

Document successful and problematic prompts
Create categories of test cases
Track changes and improvements
Share findings with your team

🤔 Reflection Questions

What surprised you during testing?
How did your prompt/assistant handle unexpected inputs?
What improvements made the biggest difference?
How might different users interact with your prompt/assistant?

💡 Pro Tips & Key Takeaways

Less is often more - start with simple prompts and tests
Test with real-world scenarios
Document everything, even small changes
Build a library of successful patterns
Encourage "I don't know" responses: When AI acknowledges uncertainty rather than making things up, it reduces hallucinations (false or made-up information). This builds trust and reliability in your assistant.
Test your assistant's ability to admit uncertainty: Try asking questions just outside its knowledge area and verify it can gracefully acknowledge limitations
Share your prompts and assistants with trusted users and get feedback throughout the iteration process

📚 Further Learning (Advanced)

Three fairies with iridescent wings and shimmering dresses are window shopping on a snowy Parisian street, admiring the Christmas decorations in a boutique window

Made with Midjourney: National Geographic photo, Three fairies with iridescent wings and shimmering dresses are window shopping on a snowy Parisian street, admiring the Christmas decorations in a boutique window. Shot with a macro lens (100mm), f/2.8 aperture, capturing the intricate details of their wings and the frost on the windowpane. Soft focus, bokeh, pastel colors, backlight. --chaos 20 --ar 5:4 --style raw --stylize 750 --weird 5 --v 6.1