GitHub Copilot CLI Introduces Rubber Duck: Cross-Model Reviews for Better Code

GitHub Copilot CLI has launched a new experimental feature called Rubber Duck, designed to give developers a second opinion by leveraging a model from a different AI family. This approach aims to catch more errors, challenge assumptions, and improve the quality of code—especially for complex, multi-file, or long-running tasks.

Why a Second Opinion Matters

Coding agents are powerful, but they can fall into the trap of compounding early mistakes. When an agent reviews its own work, it’s limited by its own training data and biases. Rubber Duck solves this by acting as an independent reviewer, using a complementary model to critique the primary agent’s plans and implementations.

How Rubber Duck Works

Cross-family review: If you’re using a Claude model as your orchestrator, Rubber Duck will use GPT-5.4 as the reviewer (and vice versa in the future).
Targeted feedback: Rubber Duck surfaces high-value concerns—missed details, questionable assumptions, and edge cases—at critical checkpoints: after planning, after complex implementations, and after writing tests.
Proven results: In benchmarks like SWE-Bench Pro, pairing Claude Sonnet 4.6 with Rubber Duck (GPT-5.4) closed nearly 75% of the performance gap between Sonnet and the more advanced Opus model. The biggest gains were seen on the hardest, multi-file problems.

Real-World Impact

Rubber Duck has already caught issues such as:

Architectural flaws (e.g., schedulers that never run jobs)
Subtle bugs (e.g., loops overwriting dictionary keys)
Cross-file conflicts (e.g., breaking dependencies between modules)

When and How to Use Rubber Duck

Rubber Duck is available in experimental mode in the Copilot CLI. It activates automatically at key moments or whenever you request a critique. To try it:

Install GitHub Copilot CLI
Run the /experimental command
Select a Claude model and ensure access to GPT-5.4

You’ll see critiques after planning, after complex code changes, or before running tests. You can also ask for a critique on demand.

Why This Matters

By combining the strengths of different AI model families, GitHub Copilot CLI helps developers avoid costly mistakes and build more robust software. Rubber Duck is especially valuable for:

Complex refactors and architectural changes
High-stakes tasks
Ensuring comprehensive test coverage
Getting a second opinion before committing to a plan

Rubber Duck is now available in experimental mode. Try it out and share your feedback with the GitHub team!

Read the original announcement on the GitHub Blog.

GitHub Copilot CLI Introduces Rubber Duck: Cross-Model Reviews for Better Code

GitHub Copilot CLI Introduces Rubber Duck: Cross-Model Reviews for Better Code

Why a Second Opinion Matters

How Rubber Duck Works

Real-World Impact

When and How to Use Rubber Duck

Why This Matters

Comments (1)

Leave a comment