V.King's Articles

ControlNet: Canny adapter

This article explores how to use the Canny ControlNet adapter. If you have no idea what it means, refer to my introductory article.Canny adapter uses the Canny edge detector, an algorithm for finding edges in an image. It is widely used in computer vision due to its simplicity and efficiency. Any picture can be a source for Canny adapter. Screenshot, photo, illustration - if you like something about it you might be able to replicate it, iterate on it, improve and alter in a specific way you want.To create a Canny control file (I will call them "CCF"), you can use the built-in adapter preprocessor, like we did in the intro article. It works fairly well, but there’s a catch: the output resolution is pretty low. The preprocessor scales the image so that its shorter side is 512 pixels before generating the control file.It can work remarkably well, but it limits fine detail. A representation of a simple black outline requires at least 3 pixels as there are 2 color transitions. The smaller the detail, the more likely it is that the preprocessor will miss or mangle it.Fortunately, the Canny adapter accepts external files — and these can have much higher resolutions. Unfortunately, it can't be easily used to create high resolution CCFs. Also, the detector parameters are hard-coded. It can be a problem. Look at these two pictures:The left one is created by the preprocessor, the other - by an external program. The left one has more details but harder to read and edit. A lot of details there are not essential for image generation and likely to be in the way if you want to change something.To work around these two problems I used ImageMagick, a free open-source cross-platform software suite for image manipulation. I highly recommend it. It is a command line tool, so brace yourself if you are not into that. Currently I use Windows which is heavily GUI oriented. Far Manager greatly simplifies work with command line.Here is my batch file for ImageMagick Canny detector, canny.bat:<ImageMagick binaries path>\convert.exe %1 -canny 0x1+%2%%+%3%% %1.canny-%2-%3.pngThe 1st parameter is a file name, the 2nd and 3rd are percentile numbers (0 to 100) used by ImageMagick Canny edge detector. I will not try to explain what they are, ImageMagick documentation is not great, but they control the sensitivity of the edge detector. Experiment with the numbers, and you will see. It usually took me about 3-4 tries to get a CCF I liked.About two months ago I heard that LLMs had become proficient in programming and asked ChatGPT to write a Java program for making CCFs with interactive visual control. It worked like charm. I will think of a way to share it. I could probably put it on GitHub. It would require "a very particular set of skills" to download, configure, and run though, so I recommend to use ImageMagic for now.A more accessible option would be an online generator. I found this one: https://randomtools.io/image-tools/edge-detection-online-toolIt has fixed Canny parameters, but it still can be useful. I didn't look too hard though, personally I don't need it. Again, if you find a better one, leave a comment.Editing CCFs is tricky - they weren’t really meant to be edited manually, and there is no rulebook for it. Topology is convoluted. A single pixel in wrong place can lead to color spill into the wrong area of the image or complete reinvisioning of parts of the scene by AI - body parts merging with environment, stuff like that. Some changes are safe - see the bear replacement in the previous article. "60% of the time, it works every time".In many cases it is easier to fix issues in CF than in "real" picture - you are just altering white lines on black field, mostly just erasing them. You can use splines to make really smooth lines with little effort, demon tails and body curves for example. Removing shadows is usually trivial. Tracing can be used to great effect (hello, layers). It gets easier with practice.Here is an example of placing multiple objects into CCF:Most of this cocktail glass was created in a CCF using ellipses and splines, I couldn't find a good reference for it. The fairy was originally sitting in a bathtub, she was generated in high resolution with a CCF ensuring AI makes the bathtub in shape and angle fitting for the glass I made. Strawberry, lime and wine bottle are from Bing image search, 3 different pictures. This seems to be the final CCF I used:Note the non-intersecting objects. Lines representing the tablecloth do not touch strawberries, yet in resulting pictures tablecloth often covers entire table. I had quite a few troubles sticking the fairy and liquid into the cocktail glass, almost given up. Now I would act differently - make a collage and produce CCF from it. As this operation loses a lot of information (such as colors) you can get away with very sloppy work and yet get workable CCFs.So, you can either edit CCF file directly or edit a "real" picture and make a CCF based on it. Neither approach is obviously superior, I choose which one to use based on what I have and what I want to achieve. Often I use both in the same project. The process is iterative. You gradually improve your CCF: erase what doesn’t work, add what does. Eventually, you should end up with a control file that consistently gives good results.To remove an object from a picture you can either paint the corresponding area of CCF black or smooth it down in real picture to the point Canny filter stops detecting the edges identifying the object.Frequent special case - I am happy with the centerpiece but don't like the background. This one is easily handled in CCF, just take a large black brush and go nuts. It is hard to mess up. Then ask for new background using prompt.I see 3 ways to place a new object into picture:- put its Canny representation into your CCF- make a rough collage of the picture you want and make a new CCF based on it- paint the part of CCF where you want changes black and ask AI to generate themIf you chose CCF alteration, it helps to have black border around the new object to reduce probability of two CF images merging incorrectly, 2-3 pixels should work. It is not precise science, but you start to feel what works and what not with practice.It is difficult to merge complex objects in direct contact when you edit CF - topology of line connections is trivial to mess up. For this reason you should generally avoid trying to merge areas covered by hair, connecting two images correctly is extremely tedious except for most trivial cases.CCFs can be scaled to some extent. There is a limit to it naturally, if lines blur together they stop being useful. Upscaling is safer. Lines don't have to be pure white, gray works. They don't have to be absolutely crisp, antialiased CCFs still work. It definitely doesn't improve results though so I wouldn't do it.To state the obvious, resolution of CCF may differ from target resolution. It may have different aspect ratio, adapter scales CCF to cover entire picture which usually means that part of CCF goes out of the frame.It is important to remember that Canny filter doesn't see anything, it's just math. Gradual transition of color is not an edge, objects that are obviously distinct to our eye are not necessarily same for the detector. It also may not detect low-intensity color transition as an edge. Dark elements adjacent to each other may not be properly separated. The same problem with clothes folds and seams marked by black lines over dark filler - they can easily be lost. The easy way to check for lost boundaries is to overlay CCF over source picture (or vice versa) and make the top one semi-opaque. Preprocessing the image before the detector run, e.g. changing brightness/contrast of the problematic area also often works. The detector doesn't care if your character looks like a clown, it cares about color transitions.Canny detects edges. Edge is a border between two areas of different colors, so it includes both borders between objects which define geometry of the scene and the less impactful borders such as shadows, cloth patterns, etc. The former are more important and if AI gets confused it is often useful to make a cleanup and simplify the scene. Running Canny detector with different settings often helps.There is a side benefit to this adapter. Generation at high resolutions, like 1536x1536, is prone to randomly fail due to instability, presumably because models were not trained with this resolution. The typical failures are doubles in the scene or hilariously malformed bodies. The glitch can be useful, the base image for twin sisters was generated this way. Canny adapter seems to help avoid this particular problem completely. It keeps the model in bounds. The majority of pictures in my profile are made in this resolution.Another benefit is, even when things fail, results stay very close to each other in geometry. Let's say I am making a picture with an angel and a demon. If I have one picture with good angel and one picture with good demon, I can "easily" move part of the picture from one to another, drop-in replacement. That's what most of those "generate by workflow" images in my gallery are about - instead of waiting for gacha to smile on me and deliver a perfect picture I just mix and match good parts of failed tries."And that's all I have to say about" Canny adapter. If you find it somewhat messy, it is. It is new experience for me and I am still learning new tricks. Maybe posting a few step by step examples of developing images would be more instructive. I will think about it.If you have specific questions, "I am here if you want to talk".Related articles: Introduction to ControlNet, ControlNet: Depth adapter.

V.King

Introduction to ControlNet.

This article explains what ControlNet is and how you can use it. It includes an example with simple instructions that you can run yourself right now, no prerequisites. It should take about 10 minutes. Just read it (trust me bro). If you want to make pictures first, you can skip the general info below and look for the pictures.Not many people on tensor.art use ControlNet. Pictures are generated in a gacha game style - one writes a prompt and hopes for the best. If the result is not satisfactory, it's either another try with the same parameters or prompt update. There are also checkpoints, LoRAs and other parameters of generation, a lot of knobs that affect generation in some peculiar and largely unpredictable way. There is some control over the process indeed, but it heavily relies on luck. There's gacha-esque fun in this process. AI generation is like a box of chocolates.Sometimes a picture turns out almost perfect. It can be awesome in every way except for having 6 fingers on someone's hand. There is no way to fix it with a prompt. ArsTechnica reported that the latest OpenAI image generator allows prompt iteration on a picture, so such problems may eventually get resolved. For example, you may be able to generate something and then ask for corrections by saying something like "good, but let it be sunset and I want the second girl from the left to be blond, go". Eventually prompts may become the only tool an AI artist really needs to build a scene. For now, prompts are rather limited.ControlNet doesn't fix it; it's another knob to use. But it allows you to control many aspects of image generation directly, spatially, like "the sword is right here, not somewhere in the picture". You can actually imagine a final picture in your head and work toward it. If you can make a rough sketch, you are halfway there. You can iterate, keep the parts you like and correct those you don't. It is still a gacha game, but your chances of getting an SSR are much higher.It allows you to shoot for much more complex scenes. There is absolutely nothing wrong with generating hundreds of pretty women portraits if it makes you happy. And I mean it; fun is precious, it is never a waste of time. But if you get bored with it, there are options.ControlNet uses an image as an additional or, in a few cases, the only prompt - sketch, pose diagram, edge/depth/normal map. "A picture is worth a thousand words". A simple doodle can be more efficient in conveying desired composition than any prompt. Also, models don't follow prompts all that well and perfectly crafted prompts fail most of the time.ControlNet works by attaching small, specialized neural networks called "adapters" to a pre-trained diffusion model. These adapters are trained to interpret specific types of visual input and influence the generation process accordingly, without retraining the whole model. This allows the base model to remain flexible and powerful, while giving users a way to “steer” the output using visual cues rather than just words.ControlNet is an open source project based on open research publications. The main contributors to both seem to be from China. Kudos to China. It was initially developed for Stable Diffusion 1.5, then adapted for SDXL and works for derived models and checkpoints. There is no ControlNet for SD 3.0 as far as I know. Tensor.art has ControlNet interface for FLUX.1 but results were consistently dismal when I tried to use it.Personally, I use mostly Pony derivatives and sometimes Illustrious checkpoints.Using ControlNet requires persistence, iterations are the whole idea. Basic skills with a graphic editor are necessary to make changes to control files used by ControlNet. If you are experienced in image editing it will help you a lot but you don't have to be a classical artist. I have zero art education beyond lessons in secondary school and I was okay-ish at best. It helps if you find joy in image editing. The ability to use layers is a great bonus.Personally I use Gimp but there are lots of good editors, including free options. Krita seems to be very good. Paint.NET is simple yet capable.Below I will use Canny and Depth adapters because these are the two I find the most useful and use frequently. There will be a separate in-depth article about them later. I will also give a brief overview of other adapters available on tensor.art in another article, there is a rather harsh article size limit here.Remixing a picture using ControlNet.Let's try using ControlNet. Here is what we will be working with:Click this link and press "remix". It will set generation parameters. Run it and be amused by the utter failure. Or just skip it, here is what I got:Not too bad. I like perspective distortion. A couple of anatomical problems, very fixable. There is no bear though. A failure.We got all the parameters right. The missing ingredients are ControlNet control files. Let's add them.Download the picture we are trying to remix and remember the location. Click on "Add ControlNet" button in "Model" section, choose "Canny" (3rd option), click on the square area in the lower left corner of the new dialogue window and pick the picture you just downloaded. Here's how it should look:Repeat the same actions one more time but choose the "Depth" adapter this time (4th option).Set weights for both at 0.5. If you did everything right it should look like this:Run it. Here is what I got this time:The clothing colors are different — which is expected, since the prompt doesn’t specify them. It is a very good picture, on par with the original one.We successfully remixed the picture without even touching the control files themselves. Let's look at them though.Click on the garbage bin icon to remove Canny adapter and add it again. Here is what it looks like before you confirm your choice:Click on the right picture, the one in black and white. You will be presented with control file created by Canny adapter preprocessor. You can save this picture:Now you can edit it and use edited version instead of the one created by the preprocessor. To do so, you just need to press "Control Image" button in the dialogue above, it will suggest you to upload your control file.Let's say we don't like the bear. No wonder, I got it from a quick Bing image search, it was a cartoonish sketch. The bear sucks. Let's paint this area black:And here is what I got using the new version of control file:That’s a much better-looking bear — more natural and fitting. Every time I run the generation with these parameters I will get a new bear. The bear is drawn there because the prompt asks for it and the control file doesn't leave any other option for its location. Also, the depth adapter still indicates to AI generator presence of large body there:Once I am happy with the bear, I can fix it in the control file and change other aspects of the generation. I can remove flowers, add a cat, make the woman run toward the bear, make her wear jeans or nothing, make her a demoness, make bear run away from her. The sky is the limit now that you can work on specific aspects of the picture with intent.As an unexpected bonus, the girl's skirt is see-through now and she seems to be going commando. Not intended and can be inappropriate. Let's fix it. I add to prompt: "elaborate blue dress, orange jacket". Here is what I got:Nice jacket. The claw is bad and fingers are wonky. Well, you know what to do. Pull the lever, let it roll. :)Neither the Canny nor the Depth adapter has anything to do with color, just geometry, so your hands are free here. Also, you can switch between checkpoints supporting ControlNet freely now, the scene will generally persist. There are multiple examples of that in my pictures.That's it. ControlNet is that simple. People really should use it more.A few clarifications. It might be obvious, but better be safe than vague. When we supply the original image the Canny preprocessor analyzes it and automatically creates a control file, an edge map (the black and white line drawing). Which we can download and reuse/abuse. The weight controls how strongly the ControlNet influences the generation, same as for LoRAs. Higher values stick closer to the control image; lower values give the AI more freedom. At high values (0.7 and above) undesirable effects are very likely.The method we used above would work for every picture on tensor.art, albeit with different degrees of success. All you need is the prompt and the picture itself, you don't necessarily need to use the same tools and LoRAs as the original author. It works for an arbitrary image too, like anime screenshot, you just need to write a prompt adding details that Canny and Depth adapters miss, like colors, lighting details, etc. That's what I do for almost every single picture I have published.That's it for now, I plan to publish a few more articles on this topic, it was an introduction.Questions and comments are welcome.Related articles: ControlNet: Canny adapter, ControlNet: Depth adapter, ControlNet: Openpose adapter, ControlNet: QR Code adapter.

V.King

172