The Search for Divergent and Creative Thinking in Deepseek-R1 32B

published 25 days ago

In my previous exploration of competitive agent loops, I focused on the basic setup and potential. Now, after hundreds of agent iterations and deeper analysis, I'm sharing insights into how these loops might help LLMs break free from their tendency to converge on predictable solutions.

My agent architecture for sharing thoughts across agents for n-iterations
My initial agent architecture to test if sharing thoughts leads to more divergent, but still valuable, responses.

The Challenge with LLM Inspiration

While LLMs excel at completing thoughts with sophistication, they often struggle to provide truly inspiring or unexpected insights. They tend to converge on safe, well-trodden paths rather than exploring novel directions. This limitation becomes particularly apparent when we're seeking to learn about those aspects of a problem we don't even know we should be considering, the unknown unknowns.

Leveraging Distinct Reasoning Blocks

One of the most exciting aspects of working with Deepseek's model has been the ability to readily extract its reasoning block. Unlike traditional approaches where agents might share direct solutions, I designed a system where agents share their thought processes instead. Think of it like musicians: an acoustic guitarist doesn't want to play the exact song that an electric guitarist writes, but they might find valuable inspiration in how the other thinks about rhythm, progression, or emotional expression.

The Experiment and Architecture

My thought process was: as agents go through more iterations, they could be inspired by both their past solutions and their competitors thoughts and that their later solutions would become increasingly divergent. Obviously this might not always be desirable. However, I would want to leverage a system like this to inject more contrarian thinking into my decision making. Right now if I ask an LLM to play devil's advocate, it's shallow and uninspired, still heavily anchoring its arguments off my initial assumptions (so working as intended!).

The architecture depicted in the image above is simple. A loop of two agents competing to come up with the best UX design solution for a roadmap interface inspired by the look of luxury watches. Each agent has a different philosophy: one is a minimalist and the other is an expressive designer. For every iteration beyond the first the agents get to see both their own previous designs and theircompetitor's previous thoughts.

The scripts for running these iterations and the follow-up metric assessments are publicly available here where you can fully inspect/adapt the methods used.

Performance Metrics

I used sentence-BERT to quantify the divergence in agent responses across time by leveraging semantic embedding similarity. Each response is encoded into a high-dimensional semantic vector, and cosine similarity is computed between these embeddings.

Ultimately, scores closer to 0 mean they're more divergent. For the heatmaps, high divergence is represented by a darker color and a cosine similarity score in the range of 0-0.500, medium is 0.501-0.750, and low is 0.751-1.

Design Evolution Over Time

Given the rough trend towards greater divergence over an increasing number of iterations, I wanted to look into it a bit more and see how it was changing. If it's wandering into totally different solution spaces than the original prompt asked for then there's more work to do to make sure the "creativity" is valuable.

o3-mini-high was used to breakdown key similarities and differences between the first and last of a few iteration sets for the minimalist responses.

Gut Reactions to the Results

After 90 iterations, the minimalist's latest design is ditching the watch inspiration and mentioning things like AI integrations and mental health monitoring. After 5 iterations, it's latest designs biggest differentiator is the ability to zoom. This is encouraging to me! I generally know the things I know and how to achieve the things I want to achieve and my current LLM usage speeds those cycles up. What they haven't done well for me yet is offer highly different perspectives or different paths to consider.

Maybe a watch interface is a terrible choice as it limits my ability to add in bells and whistles down the line. Or maybe I don't care about an AI integration or mental health monitoring and a watch interface is a healthy constraint to keep me focused on the most important things. Either way, I find value in getting pitched those alternative ideas so that I can feel more confident in my chosen path or perhaps adopt some of the interesting concepts.

What Next?

These experiments suggest that competitive agent loops, when structured around thought sharing rather than solution sharing, can help LLMs break out of their convergent thinking patterns. The key seems to be in maintaining distinct philosophical approaches while allowing agents to learn from each other's reasoning processes. This creates a kind of creative tension that pushes both agents to explore new territory while staying true to their core principles.

The time to get to greater divergence (~7 hours on my M1 with 32GB RAM) is my least favorite part. I'm going to explore the results more and try some other prompt structures to see if the cycles can be cut down further.

You can check out part 3 of this series here.

To locally-powered divergent thinking,
James