V.King

V.King

Having a blast with image generators, a perfect mix of tech and art for me. This place is glorious.
314
Followers
206
Following
0
Runs
0
Downloads
14.4K
Likes
0
Stars
Latest
Most Liked
ControlNet: Openpose adapter

ControlNet: Openpose adapter

This article introduces the OpenPose adapter for ControlNet. If you’re new to ControlNet, I recommend checking out my introductory article first.I don’t use OpenPose much myself, since I find the Canny + Depth combination more convenient. But I did some experiments specifically for this article, so consider this a first look rather than a deep dive.The OpenPose adapter lets you copy the pose of humanoid characters from one image to another. Like other ControlNet adapters, it uses a preprocessor that takes an input image and generates a control file, in this case - a stick figure representing the positions of key joints and limbs. This stick figure then guides the image generation process. Here is an example:Left - original picture by Chicken, center - stick figure generated by the OpenPose preprocessor, right - stick figure overlaid on the original image.As you can see, the stick figure isn’t a full skeleton but marks key joints as dots connected by lines. The colors aren’t random, they follow a color code for different bones and joints. Bing search gives this reference for them. Here is the list of joints and bones with color scheme:Looking at the example above, I think preprocessor didn't do a good job this time, a few joints seem to be quite a way off the mark and the legs are missing. The picture is somewhat non-trivial, it is close top-down view with perspective distortion. Still, preprocessor should have been able to handle it. It seems to be easy enough to alter the stick figures or even make them from scratch.Openpose seems to be sensitive to scheduler and sampler settings. Unlike Canny and Depth, it refused to work with karras/dpm_adaptive I use normally, so I switched to normal/euler, 20 steps.Here are the settings:And here are the results:As you can see, the pose and head position are copied to some extent.I used the default preprocessor here, there are more:Here is the stick figure for openpose_full:It includes fingers. A single white dot represents face, I guess preprocessor just failed here. Fingers are nowhere to be seen in the results:It seems the preprocessor and the main model are out of sync.The dw_openpose_full stick figure looks promising:It includes markings for face, eyes and mouth contours. Results, though, are disappointing, it seems to be completely ignored. I think dw_openpose_full preprocessor is not compatible with the adapter model.So, yeah, quite disappointing. It is not a complete loss, it does work to some extent and can be useful. It is just difficult to be excited about this one.I should point out that I’m specifically talking about the ControlNet OpenPose adapter for the SDXL-based model on tensor.art, these conclusions are in no way representative for other implementations of this adapter. Also, it is possible that I am "holding it wrong". These things can be tricky and my experience using it is very limited.If I am missing something here, feel free to drop a comment.
4
ControlNet: Depth adapter

ControlNet: Depth adapter

This article covers usage of Depth ControlNet adapter. If you don't know what it means, refer to my introductory article. Reading about Canny adapter would also be useful as it is referenced here a lot.Humans are naturally good at judging depth in images. We learn to do it our whole life. We don't even think about it, it is an automatic recognition. Looking at flat picture we pick up on dozens of cues and just know that "this cliff is very far away".AI builds a new picture. At best it has Canny control file, a very rough outline of 2D geometry. It is trained on billions of pictures though and often do things right just as we do, without thinking. For example, I can give AI a CCF with only a rider on a mount and it will understand from what angle I am looking at it, what perspective distortion is there, and will generate the surrounding street accordingly. It has a kind of spatial intelligence.But it fails a lot too, especially if you are doing something unusual. In my first image here a witch is flying high over medieval town far below. One of the problems I had was the AI's attempts to make a giantess out of her. Her foot could touch the earth near a building instead of hovering in air far away from it, immediately throwing perspective and scale out the window. I lost a lot of generation attempts to this particular issue.Vice versa, you may want to make a giant and the AI may try hard to keep the "sane" perspective you never intended. Its spatial intelligence kicks in in the wrong moment.That’s where the Depth adapter comes to the rescue. Similar to the Canny adapter in many ways, Depth adapter has identical interface. The control file is different though, there are no lines, there are areas. Lighter areas are closer to the camera; darker areas are farther away. Here is an example from introductory article:It is very clear that the woman is in the front, the bear is behind her and even further away are the trees.This is depth map. I will call them Depth control files, DCF.The Depth preprocessor does the same trick with resolution as Canny's one, downgrading it to 512 pixels. Unlike for Canny, there is a very good online tool for depth generation. It is not completely reliable, but solves a lot of problems even without editing control files. At very least you have a solid base to work with.The tool may fail to capture finer details of background if there is an object close to camera. They are just considered to be "far away". Sometimes it is better to make a separate run for the background and then merge it with the map for the character in front.Unlike Canny edge detector, a Depth preprocessor is an AI model trained to "guesstimate" 3D structure from 2D images. This means it has some level of understanding of objects in a scene. As a result, it can detect elements that Canny might miss — for example, a dark boot in a shadowy corner might go unnoticed by Canny but still be recognized by the Depth model based on a foot position. However, because the depth estimation is ultimately a prediction, it can also introduce its own inaccuracies.In short, depth preprocessors mostly work fine but can't be completely trusted.The depth preprocessor of tensor.art's ControlNet uses a model called MiDaS (short for Mixed Depth and Scale). The online tool is based on Depth Anything, a different model made by a Chinese team. Technically we are using parts of different ControlNet implementations here. Well, it works.Editing depth map files is tricky in a different way than CCFs - you have to be able to select areas and change their brightness according to their location in space. Key tools are layers, magic wand and lasso selection, brightness control, smooth, Gaussian blur and gradient. It gets easier as you get used to it.Depth also allows you to highlight elements of the picture. You can greatly improve the chances of getting good hands if you alter the depth file to match the canny file in this part. Some objects are not obvious to AI just from an outline regardless of the prompt, showing it that this is a distinct object by slightly changing brightness in the control file is priceless.Here is an example of AI consistently drawing "gorget", the metal badge-like thingy nazi police were wearing during WW2 as a piece of cloth:No amount of pleading in prompt could make it reconsider even though it drawn steel chain the thing was hanging on. Then I edited DCF and it started to work:My guess is, AI had no strong association with the word "gorget". It doesn't occur in photos all that often and when it does caption generally has other things to point out than the dorky remnant of medieval armor.Depth adapter is indispensable when objects partially obscure each other and there are small gaps between them. Here is another example of depth map:The most problematic part is left wing because it is obscured by body and the right wing which is also nearly indistinguishable from it - same color, same structure. Entire left side of the body - shoulder, elbow, breast, knee - all have a high risk of merging with grass.Initially AI generated the right wing, explaining to it that there must be another, below it, is almost completely hopeless task. I made a collage from current "good" picture - copied the right wing, made some transformations on it, cut out parts that should be obscured:Note that I changed hue of the wing so that it become yellow - it helped both Canny and Depth preprocessors to understand that these are two different wings without impacting the final result.Here is the original picture I started with, a random "bad" picture from the previous project:Here is the list of iterations the picture went through:- fixing the broken tail - editing- making demoness sleep - prompt- removing extra pair of horns - editing- fixing the left elbow - editing- adding right wing - clearing CCF space from flowers and strongly suggesting that demoness has wings in prompt- adding left wing - editingThe animals look way cuter in the original. But making AI render a bunch of small adorable animals and the demoness in one go is basically hopeless. I may make a version with separately rendered animals eventually. It's not difficult, it is merely tedious.I am reasonably confident that only the difference in intensity to directly adjacent areas of the picture is important, not the intensity itself. So don’t worry about perfectly calibrating your brightness levels — just make sure nearby areas have the right contrast to convey depth relationships. It doesn't have to make complete sense. A depth map is a hint, not a strict order.There are 3d modelling tools that allow to generate depth maps, it's likely to become an integral part of 3d modelling. I used only one so far, PoseMy.Art:Note "Export Depth", "Export Canny" and "Export OpenPose" buttons. This tool is specifically intended to be used with ControlNet-like systems:Explaining spatial relation of objects in prompt is a major pain in the ass and mostly doesn't work. This adapter allows you to avoid that. Together, Canny and Depth allow you to describe and fix the geometry of a scene pretty well.That's it about this adapter. Questions?Related articles: Introduction to ControlNet, ControlNet: Canny adapter.
4
2
ControlNet: Canny adapter

ControlNet: Canny adapter

This article explores how to use the Canny ControlNet adapter. If you have no idea what it means, refer to my introductory article.Canny adapter uses the Canny edge detector, an algorithm for finding edges in an image. It is widely used in computer vision due to its simplicity and efficiency. Any picture can be a source for Canny adapter. Screenshot, photo, illustration - if you like something about it you might be able to replicate it, iterate on it, improve and alter in a specific way you want.To create a Canny control file (I will call them "CCF"), you can use the built-in adapter preprocessor, like we did in the intro article. It works fairly well, but there’s a catch: the output resolution is pretty low. The preprocessor scales the image so that its shorter side is 512 pixels before generating the control file.It can work remarkably well, but it limits fine detail. A representation of a simple black outline requires at least 3 pixels as there are 2 color transitions. The smaller the detail, the more likely it is that the preprocessor will miss or mangle it.Fortunately, the Canny adapter accepts external files — and these can have much higher resolutions. Unfortunately, it can't be easily used to create high resolution CCFs. Also, the detector parameters are hard-coded. It can be a problem. Look at these two pictures:The left one is created by the preprocessor, the other - by an external program. The left one has more details but harder to read and edit. A lot of details there are not essential for image generation and likely to be in the way if you want to change something.To work around these two problems I used ImageMagick, a free open-source cross-platform software suite for image manipulation. I highly recommend it. It is a command line tool, so brace yourself if you are not into that. Currently I use Windows which is heavily GUI oriented. Far Manager greatly simplifies work with command line.Here is my batch file for ImageMagick Canny detector, canny.bat:<ImageMagick binaries path>\convert.exe %1 -canny 0x1+%2%%+%3%% %1.canny-%2-%3.pngThe 1st parameter is a file name, the 2nd and 3rd are percentile numbers (0 to 100) used by ImageMagick Canny edge detector. I will not try to explain what they are, ImageMagick documentation is not great, but they control the sensitivity of the edge detector. Experiment with the numbers, and you will see. It usually took me about 3-4 tries to get a CCF I liked.About two months ago I heard that LLMs had become proficient in programming and asked ChatGPT to write a Java program for making CCFs with interactive visual control. It worked like charm. I will think of a way to share it. I could probably put it on GitHub. It would require "a very particular set of skills" to download, configure, and run though, so I recommend to use ImageMagic for now.A more accessible option would be an online generator. I found this one: https://randomtools.io/image-tools/edge-detection-online-toolIt has fixed Canny parameters, but it still can be useful. I didn't look too hard though, personally I don't need it. Again, if you find a better one, leave a comment.Editing CCFs is tricky - they weren’t really meant to be edited manually, and there is no rulebook for it. Topology is convoluted. A single pixel in wrong place can lead to color spill into the wrong area of the image or complete reinvisioning of parts of the scene by AI - body parts merging with environment, stuff like that. Some changes are safe - see the bear replacement in the previous article. "60% of the time, it works every time".In many cases it is easier to fix issues in CF than in "real" picture - you are just altering white lines on black field, mostly just erasing them. You can use splines to make really smooth lines with little effort, demon tails and body curves for example. Removing shadows is usually trivial. Tracing can be used to great effect (hello, layers). It gets easier with practice.Here is an example of placing multiple objects into CCF:Most of this cocktail glass was created in a CCF using ellipses and splines, I couldn't find a good reference for it. The fairy was originally sitting in a bathtub, she was generated in high resolution with a CCF ensuring AI makes the bathtub in shape and angle fitting for the glass I made. Strawberry, lime and wine bottle are from Bing image search, 3 different pictures. This seems to be the final CCF I used:Note the non-intersecting objects. Lines representing the tablecloth do not touch strawberries, yet in resulting pictures tablecloth often covers entire table. I had quite a few troubles sticking the fairy and liquid into the cocktail glass, almost given up. Now I would act differently - make a collage and produce CCF from it. As this operation loses a lot of information (such as colors) you can get away with very sloppy work and yet get workable CCFs.So, you can either edit CCF file directly or edit a "real" picture and make a CCF based on it. Neither approach is obviously superior, I choose which one to use based on what I have and what I want to achieve. Often I use both in the same project. The process is iterative. You gradually improve your CCF: erase what doesn’t work, add what does. Eventually, you should end up with a control file that consistently gives good results.To remove an object from a picture you can either paint the corresponding area of CCF black or smooth it down in real picture to the point Canny filter stops detecting the edges identifying the object.Frequent special case - I am happy with the centerpiece but don't like the background. This one is easily handled in CCF, just take a large black brush and go nuts. It is hard to mess up. Then ask for new background using prompt.I see 3 ways to place a new object into picture:- put its Canny representation into your CCF- make a rough collage of the picture you want and make a new CCF based on it- paint the part of CCF where you want changes black and ask AI to generate themIf you chose CCF alteration, it helps to have black border around the new object to reduce probability of two CF images merging incorrectly, 2-3 pixels should work. It is not precise science, but you start to feel what works and what not with practice.It is difficult to merge complex objects in direct contact when you edit CF - topology of line connections is trivial to mess up. For this reason you should generally avoid trying to merge areas covered by hair, connecting two images correctly is extremely tedious except for most trivial cases.CCFs can be scaled to some extent. There is a limit to it naturally, if lines blur together they stop being useful. Upscaling is safer. Lines don't have to be pure white, gray works. They don't have to be absolutely crisp, antialiased CCFs still work. It definitely doesn't improve results though so I wouldn't do it.To state the obvious, resolution of CCF may differ from target resolution. It may have different aspect ratio, adapter scales CCF to cover entire picture which usually means that part of CCF goes out of the frame.It is important to remember that Canny filter doesn't see anything, it's just math. Gradual transition of color is not an edge, objects that are obviously distinct to our eye are not necessarily same for the detector. It also may not detect low-intensity color transition as an edge. Dark elements adjacent to each other may not be properly separated. The same problem with clothes folds and seams marked by black lines over dark filler - they can easily be lost. The easy way to check for lost boundaries is to overlay CCF over source picture (or vice versa) and make the top one semi-opaque. Preprocessing the image before the detector run, e.g. changing brightness/contrast of the problematic area also often works. The detector doesn't care if your character looks like a clown, it cares about color transitions.Canny detects edges. Edge is a border between two areas of different colors, so it includes both borders between objects which define geometry of the scene and the less impactful borders such as shadows, cloth patterns, etc. The former are more important and if AI gets confused it is often useful to make a cleanup and simplify the scene. Running Canny detector with different settings often helps.There is a side benefit to this adapter. Generation at high resolutions, like 1536x1536, is prone to randomly fail due to instability, presumably because models were not trained with this resolution. The typical failures are doubles in the scene or hilariously malformed bodies. The glitch can be useful, the base image for twin sisters was generated this way. Canny adapter seems to help avoid this particular problem completely. It keeps the model in bounds. The majority of pictures in my profile are made in this resolution.Another benefit is, even when things fail, results stay very close to each other in geometry. Let's say I am making a picture with an angel and a demon. If I have one picture with good angel and one picture with good demon, I can "easily" move part of the picture from one to another, drop-in replacement. That's what most of those "generate by workflow" images in my gallery are about - instead of waiting for gacha to smile on me and deliver a perfect picture I just mix and match good parts of failed tries."And that's all I have to say about" Canny adapter. If you find it somewhat messy, it is. It is new experience for me and I am still learning new tricks. Maybe posting a few step by step examples of developing images would be more instructive. I will think about it.If you have specific questions, "I am here if you want to talk".Related articles: Introduction to ControlNet, ControlNet: Depth adapter.
65
11
Introduction to ControlNet.

Introduction to ControlNet.

This article explains what ControlNet is and how you can use it. It includes an example with simple instructions that you can run yourself right now, no prerequisites. It should take about 10 minutes. Just read it (trust me bro). If you want to make pictures first, you can skip the general info below and look for the pictures.Not many people on tensor.art use ControlNet. Pictures are generated in a gacha game style - one writes a prompt and hopes for the best. If the result is not satisfactory, it's either another try with the same parameters or prompt update. There are also checkpoints, LoRAs and other parameters of generation, a lot of knobs that affect generation in some peculiar and largely unpredictable way. There is some control over the process indeed, but it heavily relies on luck. There's gacha-esque fun in this process. AI generation is like a box of chocolates.Sometimes a picture turns out almost perfect. It can be awesome in every way except for having 6 fingers on someone's hand. There is no way to fix it with a prompt. ArsTechnica reported that the latest OpenAI image generator allows prompt iteration on a picture, so such problems may eventually get resolved. For example, you may be able to generate something and then ask for corrections by saying something like "good, but let it be sunset and I want the second girl from the left to be blond, go". Eventually prompts may become the only tool an AI artist really needs to build a scene. For now, prompts are rather limited.ControlNet doesn't fix it; it's another knob to use. But it allows you to control many aspects of image generation directly, spatially, like "the sword is right here, not somewhere in the picture". You can actually imagine a final picture in your head and work toward it. If you can make a rough sketch, you are halfway there. You can iterate, keep the parts you like and correct those you don't. It is still a gacha game, but your chances of getting an SSR are much higher.It allows you to shoot for much more complex scenes. There is absolutely nothing wrong with generating hundreds of pretty women portraits if it makes you happy. And I mean it; fun is precious, it is never a waste of time. But if you get bored with it, there are options.ControlNet uses an image as an additional or, in a few cases, the only prompt - sketch, pose diagram, edge/depth/normal map. "A picture is worth a thousand words". A simple doodle can be more efficient in conveying desired composition than any prompt. Also, models don't follow prompts all that well and perfectly crafted prompts fail most of the time.ControlNet works by attaching small, specialized neural networks called "adapters" to a pre-trained diffusion model. These adapters are trained to interpret specific types of visual input and influence the generation process accordingly, without retraining the whole model. This allows the base model to remain flexible and powerful, while giving users a way to “steer” the output using visual cues rather than just words.ControlNet is an open source project based on open research publications. The main contributors to both seem to be from China. Kudos to China. It was initially developed for Stable Diffusion 1.5, then adapted for SDXL and works for derived models and checkpoints. There is no ControlNet for SD 3.0 or FLUX.1 as far as I know.Personally, I use mostly Pony derivatives and sometimes Illustrious checkpoints.Using ControlNet requires persistence, iterations are the whole idea. Basic skills with a graphic editor are necessary to make changes to control files used by ControlNet. If you are experienced in image editing it will help you a lot but you don't have to be a classical artist. I have zero art education beyond lessons in secondary school and I was okay-ish at best. It helps if you find joy in image editing. The ability to use layers is a great bonus.Personally I use Gimp but there are lots of good editors, including free options. Krita seems to be very good. Paint.NET is simple yet capable.Below I will use Canny and Depth adapters because these are the two I find the most useful and use frequently. There will be a separate in-depth article about them later. I will also give a brief overview of other adapters available on tensor.art in another article, there is a rather harsh article size limit here.Remixing a picture using ControlNet.Let's try using ControlNet. Here is what we will be working with:Click this link and press "remix". It will set generation parameters. Run it and be amused by the utter failure. Or just skip it, here is what I got:Not too bad. I like perspective distortion. A couple of anatomical problems, very fixable. There is no bear though. A failure.We got all the parameters right. The missing ingredients are ControlNet control files. Let's add them.Download the picture we are trying to remix and remember the location. Click on "Add ControlNet" button in "Model" section, choose "Canny" (3rd option), click on the square area in the lower left corner of the new dialogue window and pick the picture you just downloaded. Here's how it should look:Repeat the same actions one more time but choose the "Depth" adapter this time (4th option).Set weights for both at 0.5. If you did everything right it should look like this:Run it. Here is what I got this time:The clothing colors are different — which is expected, since the prompt doesn’t specify them. It is a very good picture, on par with the original one.We successfully remixed the picture without even touching the control files themselves. Let's look at them though.Click on the garbage bin icon to remove Canny adapter and add it again. Here is what it looks like before you confirm your choice:Click on the right picture, the one in black and white. You will be presented with control file created by Canny adapter preprocessor. You can save this picture:Now you can edit it and use edited version instead of the one created by the preprocessor. To do so, you just need to press "Control Image" button in the dialogue above, it will suggest you to upload your control file.Let's say we don't like the bear. No wonder, I got it from a quick Bing image search, it was a cartoonish sketch. The bear sucks. Let's paint this area black:And here is what I got using the new version of control file:That’s a much better-looking bear — more natural and fitting. Every time I run the generation with these parameters I will get a new bear. The bear is drawn there because the prompt asks for it and the control file doesn't leave any other option for its location. Also, the depth adapter still indicates to AI generator presence of large body there:Once I am happy with the bear, I can fix it in the control file and change other aspects of the generation. I can remove flowers, add a cat, make the woman run toward the bear, make her wear jeans or nothing, make her a demoness, make bear run away from her. The sky is the limit now that you can work on specific aspects of the picture with intent.As an unexpected bonus, the girl's skirt is see-through now and she seems to be going commando. Not intended and can be inappropriate. Let's fix it. I add to prompt: "elaborate blue dress, orange jacket". Here is what I got:Nice jacket. The claw is bad and fingers are wonky. Well, you know what to do. Pull the lever, let it roll. :)Neither the Canny nor the Depth adapter has anything to do with color, just geometry, so your hands are free here. Also, you can switch between checkpoints supporting ControlNet freely now, the scene will generally persist. There are multiple examples of that in my pictures.That's it. ControlNet is that simple. People really should use it more.A few clarifications. It might be obvious, but better be safe than vague. When we supply the original image the Canny preprocessor analyzes it and automatically creates a control file, an edge map (the black and white line drawing). Which we can download and reuse/abuse. The weight controls how strongly the ControlNet influences the generation, same as for LoRAs. Higher values stick closer to the control image; lower values give the AI more freedom. At high values (0.7 and above) undesirable effects are very likely.The method we used above would work for every picture on tensor.art, albeit with different degrees of success. All you need is the prompt and the picture itself, you don't necessarily need to use the same tools and LoRAs as the original author. It works for an arbitrary image too, like anime screenshot, you just need to write a prompt adding details that Canny and Depth adapters miss, like colors, lighting details, etc. That's what I do for almost every single picture I have published.That's it for now, I plan to publish a few more articles on this topic, it was an introduction.Questions and comments are welcome.Related articles: ControlNet: Canny adapter, ControlNet: Depth adapter, ControlNet: Openpose adapter.
125
20