-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force ASCII_8BIT encoding in POST request bodies (UDCSL #671) #5
Conversation
@NickLaiacona @camdendotlol Let me know if this seems like a reasonable way to handle Unicode filenames. If so, once this is merged I'll go ahead and make a new release version, so that you can update your projects with the new version, and get us closer to merging and releasing performant-software/iiif-cloud#25. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I haven't run into this with NBU but the fix seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this drop accent characters on resourceable.name ? Looks like most of content would stay in UTF-8 except the original filename, is that right?
I believe it wouldn't drop them, but force encoding, so it would look like this in the request body:
It seems to be converted back to UTF-8 in IIIF Cloud and UDCSL; I'm not really sure how/when that happens, but it appears correctly both in the frontend and in the Postgres DB.
Digging into it a bit further, the problem seems to be that a different part of the content is in ASCII-8BIT. The error occurs here, when the request body is initialized: The issue is that the multipart body starts getting written as UTF-8 because of the Unicode in the filename, but I can't exactly tell if this is a bug with HTTParty, a problem with the file itself, or a bug with our code. If I modify HTTParty the following way (replacing line 47 in the above snippet), I can get it to work fine: if memo.encoding == content_body(value).encoding
memo << content_body(value)
elsif !value.nil?
memo << content_body(value).force_encoding(memo.encoding)
end Alternatively, in our code, changing the body we pass along to HTTParty as I did in this branch to force |
@blms OK, so the data rests as UTF-8, it just gets transported as ASCII8. 👍 I think hacking it on our side is better than forking HTTParty. |
In this PR
Per https://github.com/performant-software/urban-design-csl/issues/136:
ASCII_8BIT
encoding of all POST request payloads