Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Paint with words #4406

Closed
1 task done
nagolinc opened this issue Nov 7, 2022 · 34 comments
Closed
1 task done

[Feature Request]: Paint with words #4406

nagolinc opened this issue Nov 7, 2022 · 34 comments
Assignees
Labels
enhancement New feature or request

Comments

@nagolinc
Copy link

nagolinc commented Nov 7, 2022

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Implement Paint with words

Proposed workflow

  1. Go to (img2img)
  2. Upload mask
  3. Add labels
  4. Get result

Additional information

No response

@dfaker dfaker self-assigned this Nov 7, 2022
@cmp-nct
Copy link

cmp-nct commented Nov 7, 2022

That looks very interesting. It basically allows you to compose an entire image with colored masks.
Something similar is possible with inpaint but you'd need to do it in many steps and at a much lower performance.

@dfaker dfaker added the enhancement New feature or request label Nov 7, 2022
@dfaker
Copy link
Collaborator

dfaker commented Nov 7, 2022

The examples look like absolute dogshit, I'm going to be very disappointed if that turns out of be a good demo of a bad system rather than the opposite. Interesting though.

@nagolinc
Copy link
Author

nagolinc commented Nov 7, 2022

The results from the paper look much better.
Screenshot 2022-11-06 234612

@dfaker
Copy link
Collaborator

dfaker commented Nov 7, 2022

Yes so much more drastically better and cohesive it makes me doubt that a version against SD is going to be effective.

@litevex
Copy link

litevex commented Nov 7, 2022

Even if those 2 results look slightly weird, it still works somewhat (even comparable to make-a-scene). However, there's the question of how you would draw a mask assigned to words for this in gradio.

@dfaker
Copy link
Collaborator

dfaker commented Nov 7, 2022

There's a colour canvas drawing facility in gradio it's just disabled by default as it was breaking layouts and generally misbehaving. from there you can get the unique colours and ask for text tagging, the required new callback hooks into the model are going to need a very convincing and powerful results though.

@CookiePPP
Copy link

CookiePPP commented Nov 7, 2022

The current implementation is definitely not working how we'd expect it to (check "Comparison" section at the bottom of this comment).

I've uploaded some results if anyone wants to experiment with me.

labeled and unlabeled are input images.
w0.4_log1p(sigma)_maxnorm is the above repo with stock settings.

The implementation in the paper was found empirically, so it's likely we can also find a good configuration by simply playing around.


Comparison

Input
A dramatic oil painting of a road
paint-with-words-sd with Stock settings (prompt and image)
0
paint-with-words-sd with only prompt (no image)
0

Image Colors/Tokens
EXAMPLE_SETTING_1 = {
    "color_context": {
        ( 48, 167,  26): "purple trees,1.0",
        (115, 232, 103): "abandoned city,1.0",
        (100, 121, 135): "road,1.0",
        (133,  94, 253): "grass,1.0",
        (  1,  47,  71): "magical portal,1.0",
        ( 38, 192, 212): "starry night,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A dramatic oil painting of a road.png",
    "input_prompt": "A dramatic oil painting of a road from a magical portal to an abandoned city with purple trees and grass in a starry night.",
    "output_dir_path": "benchmark/example_1",
}

EXAMPLE_SETTING_2 = {
    "color_context": {
        (161, 160, 173): "A large red moon,1.0",
        ( 79,  18,  96): "Bats,1.0",
        ( 82, 170,  20): "sky,1.0",
        (  0, 232, 126): "an evil pumpkin,1.0",
        (180,   0, 137): "zombies,1.0",
        (129,  65,   0): "tombs,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A Halloween scene of an evil pumpkin.png",
    "input_prompt": "A Halloween scene of an evil pumpkin. A large red moon in the sky. Bats are flying and zombies are walking out of tombs. Highly detailed fantasy art.",
    "output_dir_path": "benchmark/example_2",
}

EXAMPLE_SETTING_3 = {
    "color_context": {
        (  7, 192, 152): "dark cellar,1.0",
        ( 81,  31,  97): "monster,1.0",
        ( 71, 132,   2): "teddy bear,1.0",
        ( 32, 115, 189): "table,1.0",
        ( 70,  53, 108): "dungeons and dragons,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A monster and a teddy brear playing dungeons and dragons.png",
    "input_prompt": "A monster and a teddy bear playing dungeons and dragons around a table in a dark cellar. High quality fantasy art.",
    "output_dir_path": "benchmark/example_3",
}

EXAMPLE_SETTING_4 = {
    "color_context": {
        (138,  48,  39): "rabbit mage,1.0",
        ( 50,  32, 211): "fire ball,1.0",
        (126, 200, 100): "clouds,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A rabbit mage standing on clouds casting a fireball.png",
    "input_prompt": "A highly detailed digital art of a rabbit mage standing on clouds casting a fire ball.",
    "output_dir_path": "benchmark/example_4",
}

EXAMPLE_SETTING_5 = {
    "color_context": {
        (157, 187, 242): "rainbow beams,1.0",
        ( 27, 165, 234): "forest,1.0",
        ( 57, 244,  30): "A red Ferrari car,1.0",
        (151, 138,  41): "gravel road,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A red Ferrari car driving on a gravel road.png",
    "input_prompt": "A red Ferrari car driving on a gravel road in a forest with rainbow beams in the distance.",
    "output_dir_path": "benchmark/example_5",
}

EXAMPLE_SETTING_6 = {
    "color_context": {
        (123, 141, 146): "bar,1.0",
        ( 90, 119,  35): "red boxing gloves,1.0",
        ( 48, 167,  26): "blue boxing gloves,1.0",
        ( 10, 216, 129): "A squirrel,1.0",
        ( 72,  38,  31): "a squirrel,1.0",
    },
    "color_map_img_path": "benchmark/unlabeled/A squirrel and a squirrel with boxing gloves fighting in a bar.png",
    "input_prompt": "A squirrel with red boxing gloves and a squirrel with blue boxing gloves fighting in a bar.",
    "output_dir_path": "benchmark/example_6",
}

@CookiePPP
Copy link

@nagolinc

Here's a more direct comparison. (3rd column is stable-diffusion+paint_with_words, 4th column is stable-diffusion)
This is best-of 4 attempts for each image.


image


I can definitely see merit in adding this feature.

  • In row 1 stable-diffusion forgets the purple trees and portal without guidance.
  • In row 2 none of the images produced zombies and only 1 image had a pumpkin in the centre.
  • In row 3 stable-diffusion failed to produce a separate zombie and teddy bear, it would always be one or the other, or some weird hybrid creature.
  • In row 4 stable-diffusion failed to generate the cloud in all attempts
  • In row 5, all of the stable-diffusion rainbows were vertical for some reason? I guess horizontal rainbows don't happen in real life at such a low height.
  • In row 6 stable-diffusion consistently forget about the bar in the background and in one attempt was missing a squirrel. Both stable-diffusion methods failed to produce a blue boxing glove however, which is surprising. I suspect it's an implementation issue since I don't see anything to separate squirrel A and squirrel B in the code.

(Also, these samples were done without any weighting at all for prompt tokens or paint-with-words tokens. If you reduce the weight of background elements and increase the weight of unique elements I'm sure it'll work even better.)

@mykeehu
Copy link
Contributor

mykeehu commented Nov 7, 2022

Here is a solution, but you need to rewrite it to integrate or for extension

@152334H
Copy link

152334H commented Nov 8, 2022

@mykeehu It was linked in the initial issue description.

@cloneofsimo
Copy link

cloneofsimo commented Nov 8, 2022

Wow @CookiePPP thank you so much for the comparisons! I would be so happy if you let me use your benchmarks on my paint-with-words repo as well, do you mind if I do?

@CookiePPP
Copy link

@cloneofsimo
Feel free to use anything I posted on this thread.

@cloneofsimo
Copy link

@cloneofsimo Feel free to use anything I posted on this thread.

Thank you so much! I will certainly credit you when I add some of these materials!
And another question: were these generated with default ( 0.4* log(1+sigma)*max(QK)) values?

@CookiePPP
Copy link

CookiePPP commented Nov 8, 2022

@cloneofsimo
These were generated with 2.0*log(1+sigma)*std(QK).
I've only tested 14 combinations so be aware this configuration is probably still not the best that can be found.

@cloneofsimo
Copy link

Awesome! Thank you so much for sharing that information as well! I agree 100% that certainly much better configuration could be possible as the model structure differs with eDiffi.

@cloneofsimo
Copy link

@CookiePPP Based on your findings, repo I've added user-defined weight scaling function as well as some findings in my repo. I hope this feature gets added to A1111's repo as well

compare_std

@mykeehu
Copy link
Contributor

mykeehu commented Nov 8, 2022

I would also be happy if someone wrote it as an extension, because so many models, so many label sets and styles, that would really expand the possibilities of SD. I suppose the interesting thing would be how to do the color label table, because it has to be passed to the generator, the simple prompt would not be good. I also wonder how much can be solved with scripts only, as in the case of multiprompt scripts. Unfortunately I'm not a programmer to rewrite it, I'm just looking forward to it and brainstorming.

@cmp-nct
Copy link

cmp-nct commented Nov 8, 2022

The eDiffi ones are painfully good, Nvidia simply has an advantage: they have near unlimited GPU processing power and it only costs them a bit of energy to use it.

@cmp-nct
Copy link

cmp-nct commented Nov 8, 2022

One idea for improvement:
What if we'd generate unique noise for each of the word zones ?
Then when re-rolling we could choose which words we want to re-roll and which should stay.

So if the D&D table or the blue gloves are wrong -> reroll only the noise in that area and keep the other noise zones identical.
Of course that would mean quite a few adaptions in the UI, as we'd need a flexible number of noise inputs mapped to all the paint-words with checkboxes for "reroll"

@cloneofsimo
Copy link

Well, eDiffi does use literally three different conditionings and something like 5 times more parameters, so...

Just in case anyone is interested, there are LOT of room for improvements in my implementation

  1. as @CookiePPP mentioned above, there is no separation within same words
  2. I don't think this hard-coded 0-1 based weights are good, because some words are just tiny bit different but entirely ignored, hence using something like nn-based word similarity to build better, continous cross attention maps

If you are going to rebuild this feature into A1111's, there are probably better ways to construct cross attention weights than mine. (I will be working to get it right eventually though, my repo got way more attention than it deserves lol)

@cloneofsimo
Copy link

cloneofsimo commented Nov 8, 2022

One idea for improvement: What if we'd generate unique noise for each of the word zones ? Then when re-rolling we could choose which words we want to re-roll and which should stay.

So if the D&D table or the blue gloves are wrong -> reroll only the noise in that area and keep the other noise zones identical. Of course that would mean quite a few adaptions in the UI, as we'd need a flexible number of noise inputs mapped to all the paint-words with checkboxes for "reroll"

I think this is very good idea. I think it might work + easy to implement. If this works i'll add this feature in the future as well

@ArcticFaded
Copy link
Collaborator

I can help with implementing an extension - I just need instructions on what methods would need to changed. I have some ML literacy so just knowing about where the interaction occurs would be enough for me to try an implementation with LDM

@mykeehu
Copy link
Contributor

mykeehu commented Dec 31, 2022

I'd love to know what the status of the development of this extension is, because I'm really looking forward to it!

@nistvan86
Copy link

nistvan86 commented Jan 6, 2023

I've tried to collect what needs to be done here, but I'm still trying to understand how webUI and all the related tech works, so this is probably just a vague draft. Can someone fix this/extend upon please?

  • The current version of the paint-with-words-sd code doesn't handle natively ckpt files, just with pre-conversion. Something needs to be done here so it can work with ckpt directly. Is this complicated?

  • The Gradio UI doesn't have a good canvas editor with labeling. It's possible to create a new component for Gradio using the Svelte framework, actually I've found some canvas drawing example here which could server as a basis. But in another Reddit thread someone mentioned a library called konvajs which is mostly dependency free and can be integrated with Svelte rather easily. I've found an example here which allows drawing arbitrary shaped labels over an image, this is close to what we need.

  • The whole thing can be implemented into an extension and appear as a whole new tab in webui, like eg. image browser. Probably some part of the img2img code can be reused for this if I'm not mistaken.

@mykeehu
Copy link
Contributor

mykeehu commented Jan 7, 2023

@nistvan86 an alternative solution is what the openOutpaint extension does, that it uses a standalone interface through the api and then it is not tied to gradio.
However, starting from img2img in gradio could possibly be a good base, but you need a textbox or a tabular interface to put the color code in and next to it the prompt line by line to make it an array for input.
I'm not a programmer, just trying to find a solution for this through existing stuff.

@nistvan86
Copy link

nistvan86 commented Jan 7, 2023

@mykeehu an even better solution would be to avoid typing the same prompt elements multiple times and attach the color codes to sections of the prompt with a special syntax, like how you could currently (emphasize things):1.2.

But I'm not sure how well that could be implemented UX wise. One solution could be for example to select a part of the prompt and the drag & drop it onto a colored shape on the canvas.

@mykeehu
Copy link
Contributor

mykeehu commented Jan 7, 2023

Or when you choose a colour and start painting with it, the colour is added to the list. But how can the colour be changed afterwards? Or if a color is not needed, delete it, so you can manage layers.

@nistvan86
Copy link

nistvan86 commented Jan 7, 2023

The more I think about it, a vector graphics based editor would probably serve the needs better here. An SVG for example could be shown in the browser easily, it's DOM can be interacted with, the SVG can be saved next to the image in a rather small size (or even embedded into the resulting PNG, so it can be restored the same way you can load back config from an output).
It can be also altered afterwards, polygon's can be moved or the draw order can be changed.
It can be also rendered in whatever size and format needed on the Python side, no need for upscaling a raster image.
(eg. svg.draw.js might work, with CairoSVG on the backend)

@nistvan86
Copy link

nistvan86 commented Jan 7, 2023

Maybe worth noting there's a fork of the paint-with-words repository which uses transformer pipelines: paint-with-words-pipelines.

I've created a minimal example on top of it which can be run on Windows in a venv. (see steps.txt included)
paint-with-words-2.zip

@gsgoldma
Copy link

gsgoldma commented Feb 19, 2023

Maybe worth noting there's a fork of the paint-with-words repository which uses transformer pipelines: paint-with-words-pipelines.

I've created a minimal example on top of it which can be run on Windows in a venv. (see steps.txt included) paint-with-words-2.zip

I got this to work with 6 gigs vram.
weirdly enough, the original paint with words didn't work with my PC, and neither did the test.py in the zip.
but when I put made the venv in the old paint folder, the runner.py file worked!

@lwchen6309
Copy link

I update the paint with word extension at Paint with Word , combining ControlNet and Paint with Word (PwW).

See the results below
cn_pww_turtle

One can also use pure PwW by setting the weight of ControlNet to 0

@mykeehu
Copy link
Contributor

mykeehu commented Mar 19, 2023

@lwchen6309 please check your version, I have a conflict with Controlnet. My bug is here.

@lwchen6309
Copy link

lwchen6309 commented Mar 19, 2023

@mykeehu thanks for reporting this bug. I raise this issue here for further discussion.

@catboxanon
Copy link
Collaborator

catboxanon commented Aug 7, 2023

Closing as the extension mentioned above has been available for quite some time. https://github.com/lwchen6309/paint-with-words-sd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests