Using ComfyUI & Pixel-Art-XL for Pixel Art Practice

published today
The interface of ComfyUI with this workflow loaded
While AI-generated pixel art won't replace hand-crafted sprites, it can be a tool for learning pixel art techniques.

This guide walks you through setting up a locally running pipeline that let's you go from typing in a prompt like "a cute happy cat" and getting a sprite you can load into Aseprite to practice your techniques.

The point of this isn't to generate perfect pixel art to ship as is into your game, but to learn new approaches to things like shading, palette selection, composition, and ultimately practice with art that's exciting to you.

The Role of AI in Pixel Art Creation

Before diving into the technical setup, it's important to set realistic expectations. Out-of-the-box generated pixel art will likely never achieve the cohesive look needed for a polished game. We're even further away from reliable animated sprite generation. However, just as traditional artists use transfer paper to practice stroke techniques, we can use AI-generated pixel art to study shading, shapes, and color choices. I think of it as a beginner's tool for learning pixel art techniques rather than a replacement for hand-crafted sprites.

Setting Up Your Environment

This guide focuses on macOS setup, but I'll provide links to each tool's source which contain instructions for Windows and Linux users.

0. Links

1. Install System Dependencies

If not on macOS, check the equivalent commands within ComfyUI's and ImageMagick's documentation above.

brew install python@3.11 git imagemagick

You'll also need PyTorch with Metal support for Apple Silicon acceleration:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

2. Set Up ComfyUI

Clone the repository and install requirements:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Downloading Required Models

1. SDXL Base Model

Download the SDXL Base 1.0 model (click `Files and versions` and then the download button for the `sd_xl_base_1.0.safetensors` file) and place it in the checkpoints folder:

mv ~/Downloads/sd_xl_base_1.0.safetensors ~/ComfyUI/models/checkpoints/

2. SDXL VAE

mv ~/Downloads/sd_xl_base_1.0_0.9vae.safetensors ~/ComfyUI/models/checkpoints/

3. Pixel-Art-XL LoRA

Download the Pixel-Art-XL LoRA and move it to the LoRA folder:

mv ~/Downloads/pixel-art-xl-v1.1.safetensors ~/ComfyUI/models/loras/

Generating and Refining Pixel Art

1. Start ComfyUI

Start ComfyUI by running this command from the root of the ComfyUI directory.

python main.py

2. Import the Workflow

Navigate in a browser to http://localhost:8188. Copy the JSON below and put it into a file on your machine. Click Workflow, then Open, and select the file you just made.

{
  "last_node_id": 23,
  "last_link_id": 35,
  "nodes": [
    {
      "id": 17,
      "type": "CheckpointLoaderSimple",
      "pos": [
        -35.73746871948242,
        -764.6565551757812
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            29
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            28,
            32,
            33
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0.safetensors"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        310.84417724609375,
        -502.0832824707031
      ],
      "size": [
        400,
        200
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 28
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark, blurry, deformed, depth of field, realistic, 3d render, frame"
      ]
    },
    {
      "id": 19,
      "type": "LoraLoader",
      "pos": [
        -27.009868621826172,
        -609.7158203125
      ],
      "size": [
        315,
        126
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 29
        },
        {
          "name": "clip",
          "type": "CLIP",
          "link": 33
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            30
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": null
        }
      ],
      "properties": {
        "Node name for S&R": "LoraLoader"
      },
      "widgets_values": [
        "pixel-art-xl-v1.1.safetensors",
        1.2,
        1
      ]
    },
    {
      "id": 7,
      "type": "EmptyLatentImage",
      "pos": [
        351.8023681640625,
        -252.6610870361328
      ],
      "size": [
        315,
        106
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyLatentImage"
      },
      "widgets_values": [
        512,
        512,
        1
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        742.7706298828125,
        -761.0106201171875
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 30
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 3
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 7
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        533951669449467,
        "randomize",
        8,
        1.5,
        "lcm",
        "normal",
        1
      ]
    },
    {
      "id": 4,
      "type": "VAEDecode",
      "pos": [
        32.28488540649414,
        -210.8775634765625
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 4
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 35
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            5
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 5,
      "type": "SaveImage",
      "pos": [
        739.55224609375,
        -422.5530700683594
      ],
      "size": [
        315,
        270
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 5
        }
      ],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "pixel_art_output"
      ]
    },
    {
      "id": 2,
      "type": "CLIPTextEncode",
      "pos": [
        312.794189453125,
        -760.7067260742188
      ],
      "size": [
        400,
        200
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 32
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            3
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "a cute happy cat, (flat shading:1.2), (minimalist:1.4)"
      ]
    },
    {
      "id": 23,
      "type": "CheckpointLoaderSimple",
      "pos": [
        -35.68680953979492,
        -394.07562255859375
      ],
      "size": [
        315,
        98
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": null
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": null
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            35
          ],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_base_1.0_0.9vae.safetensors"
      ]
    }
  ],
  "links": [
    [
      3,
      2,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      4,
      3,
      0,
      4,
      0,
      "LATENT"
    ],
    [
      5,
      4,
      0,
      5,
      0,
      "IMAGE"
    ],
    [
      6,
      6,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      7,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      28,
      17,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      29,
      17,
      0,
      19,
      0,
      "MODEL"
    ],
    [
      30,
      19,
      0,
      3,
      0,
      "MODEL"
    ],
    [
      32,
      17,
      1,
      2,
      0,
      "CLIP"
    ],
    [
      33,
      17,
      1,
      19,
      1,
      "CLIP"
    ],
    [
      35,
      23,
      2,
      4,
      1,
      "VAE"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 0.8140274938684002,
      "offset": [
        758.9137024103433,
        1114.7928641006629
      ]
    }
  },
  "version": 0.4
}

At this point, you should have a UI in your browser that looks like this:

The interface of ComfyUI with this workflow loaded

3. Generate Initial Art

You can now start generating pixel art images using the Queue button towards the bottom. By default, the workflow will generate 512x512 pixel art images. These will serve as your base for further refinement. I won't go into the details in this post, but note the various settings within the KSampler and CLIP Text Encode boxes. These will be where you'll want to tweak values like the positive prompt, steps, cfg, etc. to better match your style.

4. Downscale to Pixel Art Dimensions

Once you find an image you're happy with, you can use ImageMagick with nearest-neighbor resampling to achieve proper pixelation (choose your desired size simply by modifying the dimensions):

# from the ComfyUI outputs directory
magick convert ~/ComfyUI/output/generated_image.png -resize 128x128\! -filter point ~/ComfyUI/output/sprite_128x128.png
magick convert ~/ComfyUI/output/generated_image.png -resize 64x64\! -filter point ~/ComfyUI/output/sprite_64x64.png
magick convert ~/ComfyUI/output/generated_image.png -resize 32x32\! -filter point ~/ComfyUI/output/sprite_32x32.png
The interface of ComfyUI with this workflow loadedThe interface of ComfyUI with this workflow loadedThe interface of ComfyUI with this workflow loaded

5. Final Touches in Aseprite

Open these png sprite files in Aseprite and change it from a background to a layer. Study how the AI handled different aspects of pixel art:

  • Color choices and palette limitations
  • Shading techniques and pixel placement
  • Shape simplification for small dimensions
  • Edge handling and anti-aliasing

As an extra challenge, try animating the sprite and seeing how well the initial structure lends itself to dynamic movement.

Learning from AI-Generated Pixel Art

The real value of this setup isn't in the final output but in what you can learn from it. Here are some ways I use the AI-generated pixel art for learning:

  • Study Color Relationships: Analyze how the AI handles color transitions and shading in limited palettes. I'm often surprised with how many distinct colors are needed to get some of the more textured looks.
  • Observe Shape Simplification: See how complex forms are reduced to essential pixels while maintaining recognizability.
  • Practice Refinement: Try improving the AI-generated art manually, focusing on areas where the machine struggles.
  • Experiment with Styles: Generate various art styles to understand different approaches to pixel art techniques.
  • Get Inspired: The randomness of the AI can be useful for getting unstuck. There's a lot of ways to interpret a cybernetic-bear-owl-snake and this can be a great source of inspiration.

Finding Your Artistic Voice

While AI can teach you the technical aspects like how to make something look round, how to apply shading effectively, or how to approach drawing something you've never attempted before, I think it's crucial to view these as starting points rather than final outputs.

There's something noticeably missing in AI-generated pixel art. Those creative decisions that don't quite follow the rules and the personal flair that comes from your unique perspective are the things that will help make your game uniquely yours. Don't sell yourself short and settle for the first thing that comes out of the AI.

To practicing pixel art,
James