[Feature Request]: Paint with words #4406

nagolinc · 2022-11-07T00:29:37Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Implement Paint with words

Proposed workflow

Go to (img2img)
Upload mask
Add labels
Get result

Additional information

No response

cmp-nct · 2022-11-07T03:49:59Z

That looks very interesting. It basically allows you to compose an entire image with colored masks.
Something similar is possible with inpaint but you'd need to do it in many steps and at a much lower performance.

dfaker · 2022-11-07T03:52:32Z

The examples look like absolute dogshit, I'm going to be very disappointed if that turns out of be a good demo of a bad system rather than the opposite. Interesting though.

nagolinc · 2022-11-07T04:46:58Z

The results from the paper look much better.

dfaker · 2022-11-07T05:52:25Z

Yes so much more drastically better and cohesive it makes me doubt that a version against SD is going to be effective.

litevex · 2022-11-07T13:25:46Z

Even if those 2 results look slightly weird, it still works somewhat (even comparable to make-a-scene). However, there's the question of how you would draw a mask assigned to words for this in gradio.

dfaker · 2022-11-07T13:44:30Z

There's a colour canvas drawing facility in gradio it's just disabled by default as it was breaking layouts and generally misbehaving. from there you can get the unique colours and ask for text tagging, the required new callback hooks into the model are going to need a very convincing and powerful results though.

CookiePPP · 2022-11-07T13:54:20Z

The current implementation is definitely not working how we'd expect it to (check "Comparison" section at the bottom of this comment).

I've uploaded some results if anyone wants to experiment with me.

labeled and unlabeled are input images.
w0.4_log1p(sigma)_maxnorm is the above repo with stock settings.

The implementation in the paper was found empirically, so it's likely we can also find a good configuration by simply playing around.

Comparison

Input

paint-with-words-sd with Stock settings (prompt and image)

paint-with-words-sd with only prompt (no image)

Image Colors/Tokens

EXAMPLE_SETTING_1 = {
    "color_context": {
        ( 48, 167,  26): "purple trees,1.0",
        (115, 232, 103): "abandoned city,1.0",
        (100, 121, 135): "road,1.0",
        (133,  94, 253): "grass,1.0",
        (  1,  47,  71): "magical portal,1.0",
        ( 38, 192, 212): "starry night,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A dramatic oil painting of a road.png",
    "input_prompt": "A dramatic oil painting of a road from a magical portal to an abandoned city with purple trees and grass in a starry night.",
    "output_dir_path": "benchmark/example_1",
}

EXAMPLE_SETTING_2 = {
    "color_context": {
        (161, 160, 173): "A large red moon,1.0",
        ( 79,  18,  96): "Bats,1.0",
        ( 82, 170,  20): "sky,1.0",
        (  0, 232, 126): "an evil pumpkin,1.0",
        (180,   0, 137): "zombies,1.0",
        (129,  65,   0): "tombs,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A Halloween scene of an evil pumpkin.png",
    "input_prompt": "A Halloween scene of an evil pumpkin. A large red moon in the sky. Bats are flying and zombies are walking out of tombs. Highly detailed fantasy art.",
    "output_dir_path": "benchmark/example_2",
}

EXAMPLE_SETTING_3 = {
    "color_context": {
        (  7, 192, 152): "dark cellar,1.0",
        ( 81,  31,  97): "monster,1.0",
        ( 71, 132,   2): "teddy bear,1.0",
        ( 32, 115, 189): "table,1.0",
        ( 70,  53, 108): "dungeons and dragons,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A monster and a teddy brear playing dungeons and dragons.png",
    "input_prompt": "A monster and a teddy bear playing dungeons and dragons around a table in a dark cellar. High quality fantasy art.",
    "output_dir_path": "benchmark/example_3",
}

EXAMPLE_SETTING_4 = {
    "color_context": {
        (138,  48,  39): "rabbit mage,1.0",
        ( 50,  32, 211): "fire ball,1.0",
        (126, 200, 100): "clouds,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A rabbit mage standing on clouds casting a fireball.png",
    "input_prompt": "A highly detailed digital art of a rabbit mage standing on clouds casting a fire ball.",
    "output_dir_path": "benchmark/example_4",
}

EXAMPLE_SETTING_5 = {
    "color_context": {
        (157, 187, 242): "rainbow beams,1.0",
        ( 27, 165, 234): "forest,1.0",
        ( 57, 244,  30): "A red Ferrari car,1.0",
        (151, 138,  41): "gravel road,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A red Ferrari car driving on a gravel road.png",
    "input_prompt": "A red Ferrari car driving on a gravel road in a forest with rainbow beams in the distance.",
    "output_dir_path": "benchmark/example_5",
}

EXAMPLE_SETTING_6 = {
    "color_context": {
        (123, 141, 146): "bar,1.0",
        ( 90, 119,  35): "red boxing gloves,1.0",
        ( 48, 167,  26): "blue boxing gloves,1.0",
        ( 10, 216, 129): "A squirrel,1.0",
        ( 72,  38,  31): "a squirrel,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A squirrel and a squirrel with boxing gloves fighting in a bar.png",
    "input_prompt": "A squirrel with red boxing gloves and a squirrel with blue boxing gloves fighting in a bar.",
    "output_dir_path": "benchmark/example_6",
}

CookiePPP · 2022-11-07T15:57:39Z

@nagolinc

Here's a more direct comparison. (3rd column is stable-diffusion+paint_with_words, 4th column is stable-diffusion)
This is best-of 4 attempts for each image.

I can definitely see merit in adding this feature.

In row 1 stable-diffusion forgets the purple trees and portal without guidance.
In row 2 none of the images produced zombies and only 1 image had a pumpkin in the centre.
In row 3 stable-diffusion failed to produce a separate zombie and teddy bear, it would always be one or the other, or some weird hybrid creature.
In row 4 stable-diffusion failed to generate the cloud in all attempts
In row 5, all of the stable-diffusion rainbows were vertical for some reason? I guess horizontal rainbows don't happen in real life at such a low height.
In row 6 stable-diffusion consistently forget about the bar in the background and in one attempt was missing a squirrel. Both stable-diffusion methods failed to produce a blue boxing glove however, which is surprising. I suspect it's an implementation issue since I don't see anything to separate squirrel A and squirrel B in the code.

(Also, these samples were done without any weighting at all for prompt tokens or paint-with-words tokens. If you reduce the weight of background elements and increase the weight of unique elements I'm sure it'll work even better.)

mykeehu · 2022-11-07T20:49:53Z

Here is a solution, but you need to rewrite it to integrate or for extension

152334H · 2022-11-08T00:47:15Z

@mykeehu It was linked in the initial issue description.

cloneofsimo · 2022-11-08T05:45:29Z

Wow @CookiePPP thank you so much for the comparisons! I would be so happy if you let me use your benchmarks on my paint-with-words repo as well, do you mind if I do?

CookiePPP · 2022-11-08T10:53:28Z

@cloneofsimo
Feel free to use anything I posted on this thread.

cloneofsimo · 2022-11-08T11:18:23Z

@cloneofsimo Feel free to use anything I posted on this thread.

Thank you so much! I will certainly credit you when I add some of these materials!
And another question: were these generated with default ( 0.4* log(1+sigma)*max(QK)) values?

CookiePPP · 2022-11-08T11:24:49Z

@cloneofsimo
These were generated with 2.0*log(1+sigma)*std(QK).
I've only tested 14 combinations so be aware this configuration is probably still not the best that can be found.

cloneofsimo · 2022-11-08T11:27:43Z

Awesome! Thank you so much for sharing that information as well! I agree 100% that certainly much better configuration could be possible as the model structure differs with eDiffi.

cloneofsimo · 2022-11-08T19:15:49Z

@CookiePPP Based on your findings, repo I've added user-defined weight scaling function as well as some findings in my repo. I hope this feature gets added to A1111's repo as well

mykeehu · 2022-11-08T19:53:51Z

I would also be happy if someone wrote it as an extension, because so many models, so many label sets and styles, that would really expand the possibilities of SD. I suppose the interesting thing would be how to do the color label table, because it has to be passed to the generator, the simple prompt would not be good. I also wonder how much can be solved with scripts only, as in the case of multiprompt scripts. Unfortunately I'm not a programmer to rewrite it, I'm just looking forward to it and brainstorming.

cmp-nct · 2022-11-08T19:57:41Z

The eDiffi ones are painfully good, Nvidia simply has an advantage: they have near unlimited GPU processing power and it only costs them a bit of energy to use it.

cmp-nct · 2022-11-08T20:00:13Z

One idea for improvement:
What if we'd generate unique noise for each of the word zones ?
Then when re-rolling we could choose which words we want to re-roll and which should stay.

So if the D&D table or the blue gloves are wrong -> reroll only the noise in that area and keep the other noise zones identical.
Of course that would mean quite a few adaptions in the UI, as we'd need a flexible number of noise inputs mapped to all the paint-words with checkboxes for "reroll"

cloneofsimo · 2022-11-08T20:11:08Z

Well, eDiffi does use literally three different conditionings and something like 5 times more parameters, so...

Just in case anyone is interested, there are LOT of room for improvements in my implementation

as @CookiePPP mentioned above, there is no separation within same words
I don't think this hard-coded 0-1 based weights are good, because some words are just tiny bit different but entirely ignored, hence using something like nn-based word similarity to build better, continous cross attention maps

If you are going to rebuild this feature into A1111's, there are probably better ways to construct cross attention weights than mine. (I will be working to get it right eventually though, my repo got way more attention than it deserves lol)

cloneofsimo · 2022-11-08T20:12:31Z

One idea for improvement: What if we'd generate unique noise for each of the word zones ? Then when re-rolling we could choose which words we want to re-roll and which should stay.

So if the D&D table or the blue gloves are wrong -> reroll only the noise in that area and keep the other noise zones identical. Of course that would mean quite a few adaptions in the UI, as we'd need a flexible number of noise inputs mapped to all the paint-words with checkboxes for "reroll"

I think this is very good idea. I think it might work + easy to implement. If this works i'll add this feature in the future as well

ArcticFaded · 2022-11-17T23:18:18Z

I can help with implementing an extension - I just need instructions on what methods would need to changed. I have some ML literacy so just knowing about where the interaction occurs would be enough for me to try an implementation with LDM

mykeehu · 2022-12-31T17:18:41Z

I'd love to know what the status of the development of this extension is, because I'm really looking forward to it!

nistvan86 · 2023-01-06T23:11:10Z

I've tried to collect what needs to be done here, but I'm still trying to understand how webUI and all the related tech works, so this is probably just a vague draft. Can someone fix this/extend upon please?

The current version of the paint-with-words-sd code doesn't handle natively ckpt files, just with pre-conversion. Something needs to be done here so it can work with ckpt directly. Is this complicated?
The Gradio UI doesn't have a good canvas editor with labeling. It's possible to create a new component for Gradio using the Svelte framework, actually I've found some canvas drawing example here which could server as a basis. But in another Reddit thread someone mentioned a library called konvajs which is mostly dependency free and can be integrated with Svelte rather easily. I've found an example here which allows drawing arbitrary shaped labels over an image, this is close to what we need.
The whole thing can be implemented into an extension and appear as a whole new tab in webui, like eg. image browser. Probably some part of the img2img code can be reused for this if I'm not mistaken.

mykeehu · 2023-01-07T10:55:30Z

@nistvan86 an alternative solution is what the openOutpaint extension does, that it uses a standalone interface through the api and then it is not tied to gradio.
However, starting from img2img in gradio could possibly be a good base, but you need a textbox or a tabular interface to put the color code in and next to it the prompt line by line to make it an array for input.
I'm not a programmer, just trying to find a solution for this through existing stuff.

nistvan86 · 2023-01-07T12:06:30Z

@mykeehu an even better solution would be to avoid typing the same prompt elements multiple times and attach the color codes to sections of the prompt with a special syntax, like how you could currently (emphasize things):1.2.

But I'm not sure how well that could be implemented UX wise. One solution could be for example to select a part of the prompt and the drag & drop it onto a colored shape on the canvas.

mykeehu · 2023-01-07T17:56:49Z

Or when you choose a colour and start painting with it, the colour is added to the list. But how can the colour be changed afterwards? Or if a color is not needed, delete it, so you can manage layers.

nistvan86 · 2023-01-07T19:17:19Z

The more I think about it, a vector graphics based editor would probably serve the needs better here. An SVG for example could be shown in the browser easily, it's DOM can be interacted with, the SVG can be saved next to the image in a rather small size (or even embedded into the resulting PNG, so it can be restored the same way you can load back config from an output).
It can be also altered afterwards, polygon's can be moved or the draw order can be changed.
It can be also rendered in whatever size and format needed on the Python side, no need for upscaling a raster image.
(eg. svg.draw.js might work, with CairoSVG on the backend)

nistvan86 · 2023-01-07T22:42:21Z

Maybe worth noting there's a fork of the paint-with-words repository which uses transformer pipelines: paint-with-words-pipelines.

I've created a minimal example on top of it which can be run on Windows in a venv. (see steps.txt included)
paint-with-words-2.zip

gsgoldma · 2023-02-19T18:24:05Z

Maybe worth noting there's a fork of the paint-with-words repository which uses transformer pipelines: paint-with-words-pipelines.

I've created a minimal example on top of it which can be run on Windows in a venv. (see steps.txt included) paint-with-words-2.zip

I got this to work with 6 gigs vram.
weirdly enough, the original paint with words didn't work with my PC, and neither did the test.py in the zip.
but when I put made the venv in the old paint folder, the runner.py file worked!

lwchen6309 · 2023-03-18T06:24:31Z

I update the paint with word extension at Paint with Word , combining ControlNet and Paint with Word (PwW).

See the results below

One can also use pure PwW by setting the weight of ControlNet to 0

mykeehu · 2023-03-19T06:30:56Z

@lwchen6309 please check your version, I have a conflict with Controlnet. My bug is here.

lwchen6309 · 2023-03-19T07:39:52Z

@mykeehu thanks for reporting this bug. I raise this issue here for further discussion.

catboxanon · 2023-08-07T15:26:14Z

Closing as the extension mentioned above has been available for quite some time. https://github.com/lwchen6309/paint-with-words-sd

dfaker self-assigned this Nov 7, 2022

dfaker added the enhancement New feature or request label Nov 7, 2022

cloneofsimo mentioned this issue Nov 8, 2022

Extensions for Automatic1111 version? cloneofsimo/paint-with-words-sd#1

Open

ClashSAN mentioned this issue Nov 11, 2022

[Feature Request]: add paint with words #4571

Closed

1 task

catboxanon closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Paint with words #4406

[Feature Request]: Paint with words #4406

nagolinc commented Nov 7, 2022

cmp-nct commented Nov 7, 2022

dfaker commented Nov 7, 2022

nagolinc commented Nov 7, 2022

dfaker commented Nov 7, 2022

litevex commented Nov 7, 2022

dfaker commented Nov 7, 2022

CookiePPP commented Nov 7, 2022 •

edited

Loading

CookiePPP commented Nov 7, 2022

mykeehu commented Nov 7, 2022 •

edited

Loading

152334H commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022 •

edited

Loading

CookiePPP commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022

CookiePPP commented Nov 8, 2022 •

edited

Loading

cloneofsimo commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022

mykeehu commented Nov 8, 2022

cmp-nct commented Nov 8, 2022

cmp-nct commented Nov 8, 2022 •

edited

Loading

cloneofsimo commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022 •

edited

Loading

ArcticFaded commented Nov 17, 2022

mykeehu commented Dec 31, 2022

nistvan86 commented Jan 6, 2023 •

edited

Loading

mykeehu commented Jan 7, 2023

nistvan86 commented Jan 7, 2023 •

edited

Loading

mykeehu commented Jan 7, 2023

nistvan86 commented Jan 7, 2023 •

edited

Loading

nistvan86 commented Jan 7, 2023 •

edited

Loading

gsgoldma commented Feb 19, 2023 •

edited

Loading

lwchen6309 commented Mar 18, 2023

mykeehu commented Mar 19, 2023

lwchen6309 commented Mar 19, 2023 •

edited

Loading

catboxanon commented Aug 7, 2023 •

edited

Loading

[Feature Request]: Paint with words #4406

[Feature Request]: Paint with words #4406

Comments

nagolinc commented Nov 7, 2022

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

cmp-nct commented Nov 7, 2022

dfaker commented Nov 7, 2022

nagolinc commented Nov 7, 2022

dfaker commented Nov 7, 2022

litevex commented Nov 7, 2022

dfaker commented Nov 7, 2022

CookiePPP commented Nov 7, 2022 • edited Loading

CookiePPP commented Nov 7, 2022

mykeehu commented Nov 7, 2022 • edited Loading

152334H commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022 • edited Loading

CookiePPP commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022

CookiePPP commented Nov 8, 2022 • edited Loading

cloneofsimo commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022

mykeehu commented Nov 8, 2022

cmp-nct commented Nov 8, 2022

cmp-nct commented Nov 8, 2022 • edited Loading

cloneofsimo commented Nov 8, 2022

cloneofsimo commented Nov 8, 2022 • edited Loading

ArcticFaded commented Nov 17, 2022

mykeehu commented Dec 31, 2022

nistvan86 commented Jan 6, 2023 • edited Loading

mykeehu commented Jan 7, 2023

nistvan86 commented Jan 7, 2023 • edited Loading

mykeehu commented Jan 7, 2023

nistvan86 commented Jan 7, 2023 • edited Loading

nistvan86 commented Jan 7, 2023 • edited Loading

gsgoldma commented Feb 19, 2023 • edited Loading

lwchen6309 commented Mar 18, 2023

mykeehu commented Mar 19, 2023

lwchen6309 commented Mar 19, 2023 • edited Loading

catboxanon commented Aug 7, 2023 • edited Loading

CookiePPP commented Nov 7, 2022 •

edited

Loading

mykeehu commented Nov 7, 2022 •

edited

Loading

cloneofsimo commented Nov 8, 2022 •

edited

Loading

CookiePPP commented Nov 8, 2022 •

edited

Loading

cmp-nct commented Nov 8, 2022 •

edited

Loading

cloneofsimo commented Nov 8, 2022 •

edited

Loading

nistvan86 commented Jan 6, 2023 •

edited

Loading

nistvan86 commented Jan 7, 2023 •

edited

Loading

nistvan86 commented Jan 7, 2023 •

edited

Loading

nistvan86 commented Jan 7, 2023 •

edited

Loading

gsgoldma commented Feb 19, 2023 •

edited

Loading

lwchen6309 commented Mar 19, 2023 •

edited

Loading

catboxanon commented Aug 7, 2023 •

edited

Loading