Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force ASCII_8BIT encoding in POST request bodies (UDCSL #671) #5

Merged
merged 3 commits into from
Jan 11, 2023

Conversation

blms
Copy link
Contributor

@blms blms commented Dec 19, 2022

In this PR

Per https://github.com/performant-software/urban-design-csl/issues/136:

@blms blms added the v0.1.6 Issues in v0.1.6 label Dec 19, 2022
@blms
Copy link
Contributor Author

blms commented Dec 20, 2022

@NickLaiacona @camdendotlol Let me know if this seems like a reasonable way to handle Unicode filenames.

If so, once this is merged I'll go ahead and make a new release version, so that you can update your projects with the new version, and get us closer to merging and releasing performant-software/iiif-cloud#25.

Copy link

@camdendotlol camdendotlol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I haven't run into this with NBU but the fix seems reasonable.

Copy link
Member

@NickLaiacona NickLaiacona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this drop accent characters on resourceable.name ? Looks like most of content would stay in UTF-8 except the original filename, is that right?

@blms
Copy link
Contributor Author

blms commented Dec 20, 2022

@NickLaiacona

Would this drop accent characters on resourceable.name ?

I believe it wouldn't drop them, but force encoding, so it would look like this in the request body:

Medellín.pdf --> Medelli\xCC\x81n.pdf

It seems to be converted back to UTF-8 in IIIF Cloud and UDCSL; I'm not really sure how/when that happens, but it appears correctly both in the frontend and in the Postgres DB.

Looks like most of content would stay in UTF-8 except the original filename, is that right?

Digging into it a bit further, the problem seems to be that a different part of the content is in ASCII-8BIT. The error occurs here, when the request body is initialized:

https://github.com/jnunemaker/httparty/blob/c27adec455bc6f3e8556b51d711d99ab37c49468/lib/httparty/request/body.rb#L38-L49

The issue is that the multipart body starts getting written as UTF-8 because of the Unicode in the filename, but content_body(value) always returns an ASCII_8BIT string when the actual ActionDispatch::Http::UploadedFile is passed as the value param (when it gets .read and then .to_s called on it).

I can't exactly tell if this is a bug with HTTParty, a problem with the file itself, or a bug with our code. If I modify HTTParty the following way (replacing line 47 in the above snippet), I can get it to work fine:

if memo.encoding == content_body(value).encoding
  memo << content_body(value)
elsif !value.nil?
  memo << content_body(value).force_encoding(memo.encoding)
end

Alternatively, in our code, changing the body we pass along to HTTParty as I did in this branch to force ASCII_8BIT seems to work just as well. And FWIW, the change I made here is also what's suggested in the HTTParty issue I linked above. Maybe it makes sense to keep this change for now but post my findings in the HTTParty issue and hope for a patch release?

@NickLaiacona NickLaiacona self-requested a review December 20, 2022 18:38
@NickLaiacona
Copy link
Member

It seems to be converted back to UTF-8 in IIIF Cloud and UDCSL; I'm not really sure how/when that happens, but it appears correctly both in the frontend and in the Postgres DB.

@blms OK, so the data rests as UTF-8, it just gets transported as ASCII8. 👍 I think hacking it on our side is better than forking HTTParty.

@blms blms merged commit 39b7650 into master Jan 11, 2023
@blms blms deleted the feature/udcsl671_encodings branch January 11, 2023 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v0.1.6 Issues in v0.1.6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants