bigASP v2.5 - v2.5

bigASP v2.5

CHECKPOINT
Original


Updated:

bigASP v2.5

A highly experimental model trained on over 13 MILLION images for 150 million training samples. It is based roughly on the SDXL architecture, but with Flow Matching for improved quality and dynamic range.

WARNING: This is NOT a typical SDXL model! It will not work by default!

Currently, this model only works in ComfyUI. An example workflow is included in the image above, which you should be able to drop into your ComfyUI workspace to load. If that does not work for some reason, then you can manually build a workflow by:

  • Start with a basic SDXL workflow, but add a ModelSamplingSD3 node onto the model. i.e.

    • Load Checkpoint -> ModelSamplingSD3 -> KSampler

  • Everything else is the usual SDXL workflow with two clip encoders (one for positive and one for negative), empty latent, VAE Decoder after the sampler, etc.

Resolution

Supported resolutions are listed below, sorted roughly by best to worst supported. Any resolutions below or above these are very unlikely to work.

832x1216 1216x832 832x1152 1152x896 896x1152 1344x768 768x1344 1024x1024 1152x832 1280x768 768x1280 896x1088 1344x704 704x1344 704x1472 960x1088 1088x896 1472x704 960x1024 1088x960 1536x640 1024x960 704x1408 1408x704 1600x640 1728x576 1664x576 640x1536 640x1600 576x1664 576x1728

Sampler Config

First, unlike normal SDXL generation, you now have another parameter: shift.

Shift is a parameter on the ModelSamplingSD3 node, and it bends the noise schedule. Set to 1.0, it does nothing. When set higher than 1.0 it makes the sampler spend more time in the high noise part of the schedule. That makes the sampler spend more effort on the structure of the image and less on the details.

This model is very sensitive to sampler and scheduler, and it benefits greatly from spending at least a little extra time on the high noise part of the schedule. This is unlike normal SDXL, where most schedules are designed to spend less time there.

I have, so far, found the following setups to work best for me. You should tweak and experiment on your own, but expect many failures.

  • Scale=1, Sampler=Euler, Schedule=Beta

  • Scale=6, Sampler=Euler, Schedule=Normal

  • Scale=3, Sampler=Euler, Schedule=Normal

I have not had much success with samplers other than Euler. UniPC does work, but generally did not perform as well. Most of the others fail or were worse. But my testing is very limited so far. It's possible the other samplers could work, but are misinterpreting this model (since it's a freak).

Beta schedule is the best general purpose option and doesn't need the scale parameter tweaked. Beta schedule forms an "S" in the noise schedule, with the first half spending more time than usual in high noise, and the latter half spending more time in low noise. This provides a good balance between the quality of image structure and the quality of details.

Normal schedule generally requires scale to be greater than 1, with values between 3 and 6 working best. I found no benefit from going higher than 6. This setup results in the sampler spending most of its time on the image structure, which means image details can suffer.

Which setup you use can vary depending on your preferences and the specific generation you're going for. When I say "image structure quality" I mean things like ensuring the overall shape of objects is correct, and placement of objects is correct. When image structure is not properly formed, you'll tend to see lots of mangled messes, extra limbs, etc. If you're doing close-ups, structure is less important and you should be able to tweak your setup so it spends more time on details. If you're medium shot or further out, structure becomes increasingly important.

CFG and PAG

In my very limited testing I've found values for CFG between 3.0 and 6.0 to work best. As always, CFG is trading off between quality and diversity, so lower CFGs produce a greater variety of images but lower quality, and vice versa. Though the quality at 2.0 and below tends to be so low as to be unusable.

I highly recommend using a PerturbedAttentionGuidance node, which should be placed after ModelSamplingSD3 and before KSampler. This has a scale parameter which you can adjust. I tend to keep it hovered around 2.0. When using PAG, you'll usually want to decrease CFG. When I have PAG enabled I'll keep my CFG between 2.0 and 5.0.

The exact values for CFG and PAG can vary depending on personal preference and what you're trying to generate. If you're not overly familiar with them, set them to the middle of the recommended ranges and then adjust up and down to get a feel for how they behave in your setup.

PAG can help considerably with image quality and reliability. However it can also tend to make images more punchy and contrast, which you may not want depending on what you're going for. Like many things it's a balancing act and it can be disabled if it's overcooking your gens.

Steps

I dunno, 28 - 50? I usually hover around 40, but I'm a weirdo.

Negative Prompt

So far the best negative I've found is simply "low quality". A blank negative works as well, as does more complicated negatives. But "low quality" alone provides a significant boost in generation quality, and other things like "deformed", "lowres", etc didn't seem to help much for me.

Positive Prompt

I do not have many recommendations here, since I have not played with the model enough to know for sure how best to prompt it. At the very least you should know that the model was trained with the following quality keywords:

  • worst quality

  • low quality

  • normal quality

  • high quality

  • best quality

  • masterpiece quality

These were injected into the tag strings and captions during training. The model generally shouldn't care where in the prompt you put the quality keyword, but closer to the beginning will have the greatest effect. You do not need to include multiple quality keywords, just the one instance should be fine. I also haven't found the need to weight the keyword.

You do not have to include a quality keyword in your prompt, it is totally optional.

I do not recommend using "masterpiece quality" as it causes the model to tend toward producing illustrations/drawings instead of photos. I've found "high quality" to be sufficient for most uses, and I just start most of my prompts with "A high quality photograph of" blah blah.

The model was trained with a variety of captioning styles, thanks to JoyCaption Beta One, along with tag strings. This should, in theory, enable you to use any prompting style you like. However in my limited testing so far I've found natural language captions to perform best overall, occasionally with tag string puke thrown at the end to tweak things. Your favorite chatbot can help write these, or you can use my custom prompt enhancer/writer (https://huggingface.co/spaces/fancyfeast/llama-bigasp-prompt-enhancer).

If you're prompting for mature subjects, I would advise to try using the kind of neutral wording that chatbots like to use for various body parts and activities. The model should understand slang, but so far in my attempts they make the gens a little worse.

The model was trained on a wide variety of images, so concept coverage should be fairly good, but not yet on the level of internet-wide models like Flux.

Support

If you want to help support dumb experiments like this, JoyCaption, and (hopefully) v3: https://ko-fi.com/fpgaminer

Version Detail

Other
# bigASP v2.5

Project Permissions

    Use Permissions

  • Use in TENSOR Online

  • As a online training base model on TENSOR

  • Use without crediting me

  • Share merges of this model

  • Use different permissions on merges

    Commercial Use

  • Sell generated contents

  • Use on generation services

  • Sell this model or merges

Related Posts