Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pre-trained checkpoint generates very short output #38

Open
Richar-Du opened this issue Jul 4, 2023 · 7 comments
Open

The pre-trained checkpoint generates very short output #38

Richar-Du opened this issue Jul 4, 2023 · 7 comments

Comments

@Richar-Du
Copy link

Thanks for your awesome work!

I want to utilize the model to generate the HTML of an image, so I choose the pre-trained checkpoint without fine-tuning. However, the generated output is very short. For example, the following code only generate <img_src=image> without any detailed struct.

from PIL import Image
import torch
from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
device = torch.device("cuda")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-large")
model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-large").to(device)

img_path  = 'biography.png'
image = Image.open(img_path)
processor.image_processor.is_vqa=False

inputs = processor(images=image, return_tensors="pt").to(device)
generated_ids = model.generate(**inputs, max_length=1000)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

My transformers version is 4.28.0. Do you know how to solve this problem? Thanks in advance :)

@nbroad1881
Copy link

Should probably upgrade transformers huggingface/transformers#22903

@Richar-Du
Copy link
Author

Richar-Du commented Jul 8, 2023

I have updated transformers to 4.30.2, but it doesn't work. The input of the processor is:
image
and I want to use pix2struct-large to generate its corresponding html. However, now the generated text is just: '<>'

@nbroad1881 @younesbelkada

@HeimingX
Copy link

Hi, I also encountered the same problem. I took a screenshot of the left subgraph of Figure 1 in the pix2struct paper, and the pix2struct-large model can only output the same '<>'. This is severely inconsistent with expectations and I am quite confused. I am eagerly anticipating the response from the author. Thanks a lot.

PS: my transformer version is 4.31.0.

@ChenDelong1999
Copy link

+1

@nbroad1881
Copy link

@kentonl, is there a prompt for pretraining?

@luukvankooten
Copy link

+1

1 similar comment
@Alexwangziyu
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants