Claude Sonnet vs Gemini Pro | AI API Design Showdown
Science & Technology
Introduction
In this article, we delve into an exciting showdown between two leading AI models: Anthropic's Claude and Google's Gemini Pro. The challenge? To design the best open API document for a fictional service called Mini Links. This comparison aims to highlight how each AI handles this task, evaluates its performance, and the improvements made based on feedback.
Setting the Stage
We begin the process using Anthropic's Claude 3.5, which is selected through the Anthropic API Workbench. The setup includes the following parameters:
- Temperature: Set to 1
- Maximum Tokens: Enabled to allow for a comprehensive output
The same system prompt is applied to both language models. We ask both models to create an open API for Mini Links, which is slightly more complex than a typical CRUD API.
Anthropic Claude's Initial Attempt
Upon generating its first draft, Claude provides a detailed open API document, complete with:
- Endpoints
- Security Schemes
- Request and Response Schemas
- Error Responses
After copying the generated API document into VS Code, we submit it to the Rate My Open API tool. The initial scores from the analysis are as follows:
- Overall Score: 60
- Documentation: 65
- Completeness: 63
- SDK Generation: 78
- Security: 47
With this feedback, we move forward to enhance Claude's API output.
Revision Based on Feedback
Using the feedback received, Claude makes significant updates to the API document. We re-submit the revised version to Rate My Open API and await the results. The improvements lead to an impressive overall score of 99, with perfect marks for Documentation and Completeness at 100, and an excellent 97 for SDK Generation. The Security score also surged from 47 to 100, showcasing Claude's ability to effectively incorporate feedback and generate a high-quality open API document.
Google Gemini Pro's Initial Attempt
Next, we shift focus to Google’s Gemini Pro using Vertex AI Studio. With identical settings (temperature and maximum tokens), we enter the same prompt to assess its capabilities.
Gemini Pro produces an open API document; however, upon evaluation through Rate My Open API, it scores significantly lower:
- Overall Score: 46
- Documentation: Low
- Completeness: Low
- Security: Low
Revision Based on Feedback
Like Claude, we present Gemini with the same feedback in a bid to elevate its output quality. The revision process, however, takes longer compared to Claude. Once the revised version is submitted to Rate My Open API, it attains an overall score of 93, with all aspects showing notable improvements. While it captures the feedback well, it ultimately falls short of Claude's remarkable performance.
Conclusion
In this AI API design showdown, Anthropic's Claude 3.5 emerges victorious with a remarkable ability to adapt to feedback and create a top-tier open API document. With an overall score of 99, it dethrones Chat GPT 4 to secure the top position, leaving room for Google Gemini Pro to improve further.
Be sure to subscribe, as we will continue exploring the capabilities of various language models in future challenges.
Keywords
- Anthropic Claude
- Google Gemini Pro
- AI API design
- Open API
- Rate My Open API
- Feedback implementation
- API documentation
- SDK generation
- Security in APIs
FAQ
Q: What was the specific task given to Claude and Gemini Pro?
A: Both AI models were tasked to design an open API for a fictional service called Mini Links, which is more complex than standard CRUD APIs.
Q: How did Claude perform on the first attempt?
A: Claude received an initial score of 60, with areas for improvement in documentation, completeness, and security.
Q: What score did Claude achieve after revisions?
A: After making important updates based on feedback, Claude achieved a remarkable overall score of 99.
Q: How did Google Gemini Pro fare in comparison?
A: Gemini Pro initially scored 46 but improved to 93 in its revised version after applying feedback.
Q: What aspects did Claude excel in?
A: Claude received perfect scores of 100 in documentation and completeness, and significantly improved security scores, showcasing its responsiveness to feedback.