TCO

TCO

10
Followers
47
Following
2.4K
Runs
3
Downloads
151
Likes
10
Stars

Articles

View All
From Prompt to Picture: Comparing Text, Image, and ControlNet Generation Methods

From Prompt to Picture: Comparing Text, Image, and ControlNet Generation Methods

This article will showcase the differences from Text-to-Image, Image-to-Image, and ControlNet for the same prompt.This image of Jane Lane from Daria (MTV 1997) will be used as the source image with control net and image to image. While prompt crafting, one option is to input your image into a free online resource such as Hugging Face's Clip Interrogator to extract a text prompt to match the image. For all examples in this article this prompt was used; Prompt: "Jane Lane, with short black bob haircut and multiple earrings, holding a paintbrush, gothic, woman, painting at easel, The background is simple and uses flat colors, evoking the distinct animation style of late 1990s adult cartoons. bold outlines, minimal shading, line drawing, She wears a red shirt with rolled sleeves and a white cloth draped over her shoulder. overall mood is quirky and creative, emphasizing alternative and artistic vibes"While this text prompt won't be drastically changing the image, it will provide the AI model with a general outline to recreate the image. Running this prompt through text to image with no image through Flux Kontext Alpha with 7 guidance scale, Using control net canny allows for precise transference of the outline and design, while Jane's eyes vary from the original, her hair shape is the same as well as her and the painting's pose. Elements like the eyes can be changed with further text refinement and settings adjustments, or the more advanced inpaint. This image was generated using image to image with .63 Denoise. Closer to 0 will match closer to the original image, and closer to 1 will generate images further away from the source image with more AI creativity. From my experience .5 - .8 is the range of recognizable. There is no preset one right or wrong way to create images, there are tradeoffs with each method. Text to image gives variance but does not match the source material. Control Net allows precision, but will maintain that same posture in each generation unless strength is lowered in settings. Image to image with denoise strength allows more adjustments to the image, the prompt, and the strength of settings. Once you understand the benefits of each method you can decide for yourself which is best suited for each image you are generating. The end goal for all users is best results with least amount of cost, understanding and utilizing the full set of model tools will give you the ability to do so. This is a simple example with simple results, you can extrapolate and apply these foundational generating settings to more advanced image creation. Remixing other AI images, using ADetailers, adding one or multiple LoRAs, using Control Net iP adapter, Inpainting, there are many ways to alter your images further. These methods cost more credits and can be challenging, it is recommended to research and have a basic understanding to get the results you want. Text to image, Control Net, and Image to Image should cover a large amount of use cases, you are able to create freely of any prior source constraint with text to image. There is the precision of matching shapes, vehicles, poses, animals, etc. with Control Net, and image to image can give you nearly photocopy subtle alterations all the way to unrecognizable remixes and every step in between. In my opinion a source image allows a sort of shorthand in facilitating you to convey the adage of a picture is worth a thousand words to the AI without a long verbose prompt. The AI will view the source image and will apply visual information to your new generations based on settings of strength scale. This allows you creativity with making your prompt if you choose unrelated to the image you are using as a source. You can prompt juxtapositions, conflicting elements, complimentary prompting, it's entirely to your desired image result. With dedication, creativity, and a passion for your efforts you should quickly be making the types of image results you want.
2
Leveraging img2img with LoRAs and prompting

Leveraging img2img with LoRAs and prompting

There are numerous guides on Tensor and the internet at large that detail proper prompting for the different models available for image generation. This guide will cover the Flux model with image to image source denoise, in combination with text to image prompting with added LoRAs. This combination allows the end user multiple access points for drastic/subtle variance in image output. Selecting which source image will effect numerous aesthetic changes in your final image output. Medium in terms of realism (photo), painting, crayon, line drawing, etc. each will have a distinct core transference to the final output. The img2img AI model will detect color, shapes, depth, objects, people, animals, etc. and apply as much to the final output as the end user adjusts with settings such as denoise strength, prompt cfg scale, and LoRA strengths. Below are some examples of those adjustments.The same prompt was used, LoRA strength was increased to get image variance using a cinematic horror LoRA. Using a Flux model with Schnell allows higher LoRA strengths beyond 2, which can be used to over saturate your images with certain LoRA aesthetics. The first remixed images of Xena above shows heavy line work that the Jeanne LoRA was trained on. The second image is within the standard 2 scale for changes such as a more detailed face. Utilizing all of these elements in tandem allows for precise adjustments of strength scale. Denoise strength closer to 0 will match the original image similar to a photocopy. The larger the scale towards 1 the more creativity the Flux model will add to the image from the training data of Flux. Adding one or more LoRAs will give you further aesthetic control, adding the type of medium and art style you want the final output. For example if making origami prompts and the art style isn't translating, adding an origami LoRA will add origami to every image. The trade off may be that the style of origami for this example may be off from the final type you are looking for. Modular or kirigami etc. The training data of a LoRA is much more honed and pronounced in the image output compared to the Flux training data. This should be factored into your LoRA strength scale use.LoRA strength scale can be set between .7 - 1 for regular results. For less LoRA strength you can set as low as .01 - .1, and depending on the model (Schnell, Dev) can go as high as 2 - 10, this will add a ridiculous amount of LoRA strength typically and make incoherent noise images. Finding the right balance for all settings is key to being able to make images in the style you want consistently. There are numerous options on Tensor in terms of LoRAs for you to choose which LoRA selection while creatively deciding which source image and prompt will all coalesce into the image you are intending to make. The above screengrab from Ferngully was remixed through The Iron Giant backgrounds LoRA to remix the original style into another distinct yet similar setting. Changing denoise strength, adjusting CFG strength scale for the prompt, or the LoRA strength scale will all affect the image output.This method can be used for daily challenge posts, events, etc. This allows you a foundation of visual information, along with your crafted prompt and LoRA selection to remix source images. Using low strength will yield similar results. Higher denoise strength will give you some elements matching the original images (such as colors and mood) while drastically changing all other output elements. Through time and practice you can use this method as a shortcut to get closer and faster to your final output image with fewer generations. One way to use this method is on a lower cost model to get a better source image closer to what you want as a final output image, to then use the new source image at closer denoise strength on a higher cost model. You can use one LoRA at a time and change from one LoRA to another when you generate a better image. You can also add multiple LoRAs at once with different strength scale settings for each LoRA while having all the separate LoRAs change the one image in one generation. There is a lot of trial and error in image generation, and it may take some time and credits before finding exactly what works for you. Sticking to standard scale settings at first is recommended to avoid some of the unintended outputs from the farthest ends of the strength scale. Choosing early on if you want low denoising strength for similar images or high denoise strength for creative variance is an important setting to have decided. You can add low strength LoRA to a low Denoise image (.5 - .7) to have subtle changes added to your image of a specific style.

Posts