Supercharging LLM Brainstorming via Breadth-First Agent Architecture
LLMs are typically used to complete your thoughts. But this autocomplete-style interaction often leads to diminishing returns. For me, they start to feel more like they're in your way than accelerating your work after you've used them for a given task enough times. Thus, I've been exploring how to transform them from thought-completers into tools for creative exploration.
This is the latest of several experiments using locally run LLMs like Deepseek R1 32B and Mistral NeMo for creative generations. By shifting away from iterative refinement toward parallel idea generation, the results suggest a better way for these models to help you find unknown unknowns.

The Previous Approach: Depth-First Architecture
In my latest experiment, designing a pipeline for LLMs to generate pixel art sprites, I used an architecture where each iteration refines the solution and thoughts from the previous. There's two things I don't like about this approach: 1. if the initial proposal has flaws, the 500th iteration would continue to explore in suboptimal directions and 2. it doesn't naturally seek insights in potentially valuable adjacent solution spaces.

The New Approach: Breadth-First Architecture
To address these weaknesses, I wanted to try a new architecture that prioritizes breadth over depth (all code available here on GitHub). Here's how it works:
- 1. Generate New Idea: The system starts by a creator agent reasoning about and generating a single proposal.
- 2A. Extract Valuable Thoughts: For each generation, a critic agent extracts the most valuable insight from the creator's reasoning process.
- 2B. Extract Design Elements: A separate critic identifies the most valuable elements from the creator's proposed solution.
- 3. Semantic Filtering: These key insights and design elements are only stored if they're sufficiently different from previously stored items, ensuring we maintain diversity.
- 4. Iterative Exploration: The process then repeats, occasionally allowing for one progressive refinement of an idea instead of starting fresh.
The Results of the Experiment
I tasked the creator with designing an algorithm for generating a pixel art sword sprite. This creator-critic pipeline ran for 500 iterations and a similarity threshold of 0.7 for filtering out ideas and solutions that were too semantically similar. By the end of the run, 99 differentiated thoughts were extracted from the reasoning blocks and 131 unique design elements for the algorithm were found.
I built the Sift interface to help me review these results more easily. Upon review, the results were more inpsiring than the depth-first approach. Whereas the depth-first designs feel like take-it-or-leave-it, the breadth-first outputs feel like it's arriving at components that can be mixed and matched into something much more novel. I won't pretend to be an expert about Bresenham's Circle, cellular automata, and Perlin noise, but I'm intrigued to learn about how they could work together to prove useful in generating pixel art.
Is Too Many Ideas a Good Problem?
Two good ideas don't automatically combine to make one great idea. This is even less true for one hundred ideas. This an immediately obvious hurdle with this approach where there's combinatorial explosion in the number of possible ideas. With enough guardrails, I think there's a decent chance of automating the extraction of the value from each idea in a way that's greater than the sum of its parts. At a minimum, these results are promising in a way to help find things you didn't know you were looking for.
End Game
I started all these agent architecture experiments a few weeks ago once I saw how capable Deepseek R1 was as a local reasoning model. My goal has slowly been evolving to get as close to an infinite workforce as possible. The steps to get there are still many:
- Leveraging the Think Blocks: the reasoning process from Deepseek R1 is still untapped potential. I often find these to be more valuable than the outputs themselves, but I haven't tried to really leverage them yet. One quick test could be to discard the outputs entirely and use the thoughts to dynamically prompt other more capable models to do something like write code.
- Combining the Architectures: From adverserial agents to creator-critic (depth/breadth variants) relationships, there's a lot of different ways to combine these agent workflows. I'm curious to see how connecting bits of these architectures together might produce more valuable results.
- More Model Evaluations: The Hugging Face leaderboard is constantly shifting. I think it's worth re-evaluating the usage of things like Mistral-NeMo and seeing if Qwen2.5 makes a difference for any of the architectures.
Whichever path I explore next I'll be sure to post the results here. At minimum, this approach seems valuable for getting a better sense of the solution space before diving deep into any particular direction.