Attach document image to tweets #6

Irio · 2017-07-03T08:43:14Z

Following @talespaiva's suggestion in a tweet and recent replies to
@RosieDaSerenata.

Docs for Twitter API: https://python-twitter.readthedocs.io/en/latest/_modules/twitter/api.html?highlight=%22def%20PostUpdate%22

The text was updated successfully, but these errors were encountered:

cuducos · 2017-07-03T10:41:47Z

A possible roadmap:

convert the receipt from PDF to PNG
crop the white paper areas (sometimes small receipts are in an A4 page size scanned image)
upload the PNG to someplace like Imgur
add the PNG URL to the tweet

paulocezar · 2017-10-05T14:54:18Z

I'll give this one a try.

silviodc · 2017-10-07T09:49:51Z

Hi @paulocezar

To convert the PDF to Image I suggest you to take a look in these notebooks from okfn-brasil/serenata-de-amor#238

This, you can speed up your work focusing on crop the image and uploading it. The necessary libraries are in the end of the docker file. "OpenCV, Wand..." . Try to look in segmentation techniques to crop the recipe :)
Looking forward to see your #PR ;)

murilobsd · 2017-10-19T12:45:35Z

Hi,
I do not know how the progress of this issue is? Maybe the trim function http://docs.wand-py.org/en/0.3-maintenance/wand/image.html#wand.image.Image.trim might help!

silviodc · 2017-10-19T16:24:00Z

Thanks for sharing it if us @murilobsd . ;)

In Addition, I would like to suggest who is implementing it to take a look in these libraries to upload the images :)

https://pypi.python.org/pypi/python-tumblpy/1.1.4
https://github.com/Imgur/imgurpython

CauanCabral · 2017-11-09T15:12:55Z

Why not upload image to twitter? Keeping only twitter as external service dependency.

https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload

cuducos · 2017-11-09T15:16:50Z

Why not upload image to twitter?

I think this is better/easier/simpler than using a third-party service for hosting images (unless @silviodc has other usages for the storage in mind). What's is needed IMHO is an implementation sich as:

Get reimbursement data needed to build the receipt URL (applicant_id, year and document_id) to concatenate some string and get the receipt URL
Try to fetch the PDF
If it succeeded convert to PNG
Crop it
Add to the tweet with the API @CauanCabral linked

rodolfolottin · 2018-01-24T12:24:08Z

Hi! Does anyone need some help with this issue?

cuducos · 2018-01-24T13:01:25Z

AFAIK there's no one working on that, @rodolfolottin – make yourself at home : )

rodolfolottin · 2018-01-24T13:23:54Z

Ok @cuducos . I'll give it a try.

rodolfolottin · 2018-01-29T14:37:37Z

So, I have some doubts about how to test this functionally. First one is: how can I get some data, once that the tests are using mocks? I know that I can get the pdf directly in the camara’s web site, but I also want to see the data from each reimbursement tuple.

Another one is: what my tests should test? I get that I should test the tweet content that is going to be posted, but what about the fetched pdf? And the blank area that I have to crop, how can/should I test that? Should I use some example pdf as fixture?

Many thanks!

cuducos · 2018-01-29T15:08:28Z

Hi @rodolfolottin, let me recap road map drafted above:

Get reimbursement data needed to build the receipt URL (applicant_id, year and document_id) to concatenate some string and get the receipt URL

Try to fetch the PDF

If it succeeded convert to PNG

Crop it

Add to the tweet with the API @CauanCabral linked

Given these steps, this is my 2c:

In steps 1 and 5 we're responsible for generating the right calls to external services, but not responsible to manage the calls themselves. What I mean is that:

In step 1 we must assert we're generating the proper URL to download the PDF and passing it to the download function (for example, urllib.request.urlretrieve)
In step 5 we must assert we're properly calling the Twitter API with the image attached

That said, we I'd say that in step 1 we can mock the download method and:

Assert it's called with the proper URL
Use a fixture as it's response, so we have a real PDF file to test steps 2, 3 and 4

Then we must mock the Twitter API call and assert that we're calling it with the image as an attachement.

Does that make sense to you all?

rodolfolottin · 2018-02-04T23:03:37Z

Sorry for the late answer.

Yeah, @cuducos . Thanks again!

Now I'm working on croping the image using wand, but it's not being easy. My first approach was to try to crop the image based in it background color. As the image is a scan itself, the whole background of the image have the same white color. I'm looking for some related problems, but the ones that I've founded have, in general, two different colors, which makes easier to differentiate the image.

Edit: just thinking here, but maybe, IDK, I could parse the rows and colums from the image and crop the pixels based on the presence of a different color than white.

cuducos · 2018-02-05T13:20:41Z

Hi @rodolfolottin,

I see to non-exclusive possibilities here:

Ask people in the Telegram group if they have any experience in automatic cropping scanned images (because scanning always leave some pixels here and there and I think a simple approach based on color won't work)
Baby steps: we put this image in production without cropping and adding it as a feature later ; )

silviodc · 2018-02-05T13:38:13Z

Hi guys,

One question about the crop of images.

The function mentioned by @murilobsd (trim) doesn't work?

trim(color=None, fuzz=0)
Parameters: | color (Color) – the border color to remove. if it’s omitted top left pixel is used by defaultfuzz (numbers.Integral) – Defines how much tolerance is acceptable to consider two colors as the same.

PS: In your case you will use it without color and defining a defaultfuzz empirically.

rodolfolottin · 2018-02-05T23:52:09Z

Hi @silviodc and @cuducos . Thanks for your help.

@cuducos , as the part of crop the image was the hard one to me I decided to go for and do some tests. Because of that, I don't have the another part done yet, but I can work on finish it.

@silviodc , in my tests with this function I was using the white color and I did'nt get what I expected. For this image, the more I increase the defaultfuzz value, most of the image (the invoice) is cropped. In both cases, using the white color and not using it, I got the same results. And as sometimes the invoice is not in the center of the scanned image, I did'nt fell secure to go on with this approach. As an example I am attaching a cropped image with a defaultfuzz value of 50%.

Here it is.

I'm taking the @cuducos advice of doing baby steps and I will worry with the croping function later.

silviodc · 2018-02-05T23:59:39Z

Hi @rodolfolottin
Thanks for the feedback. Maybe this weekend I will try to combine some edge detection and crop... I will let you know if it works.

CauanCabral · 2018-02-06T00:16:56Z

Hey, today I asked for help a friend who work with image processing in the job and he suggest the use of OpenCV for that.

The response: https://twitter.com/begnini/status/960547129264615425
StackOverflow related link: https://pt.stackoverflow.com/a/265916

Both are in portuguese.

begnini · 2018-02-06T01:11:04Z

Hi,

my friend @CauanCabral pointed me to this issue and I'm played a little with the documents. I work with digitalized documents and I know some are hard to manipulate, so, what I made is good, but is not pixel perfect.

To proof the concept, I downloaded 100 pdfs from the jarbas.sereneta.ai home page, and with pdfimage extracted all images from these pdfs. After this, I processed these images.

The result you can see here https://github.com/begnini/document_crop/blob/master/crop.md. The code is in this repository, too (https://github.com/begnini/document_crop/blob/master/crop.py).

I'll improve the documentation later, but if you have any questions, be free to ask me.

cuducos mentioned this issue Jul 3, 2017

Add receipt image to tweets #7

Closed

jtemporal added the enhancement label Jul 5, 2017

anaschwendler added the hacktoberfest label Oct 2, 2017

rodolfolottin mentioned this issue Apr 8, 2018

Attach document image to tweet #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attach document image to tweets #6

Attach document image to tweets #6

Irio commented Jul 3, 2017

cuducos commented Jul 3, 2017

paulocezar commented Oct 5, 2017

silviodc commented Oct 7, 2017

murilobsd commented Oct 19, 2017

silviodc commented Oct 19, 2017 •

edited

Loading

CauanCabral commented Nov 9, 2017

cuducos commented Nov 9, 2017

rodolfolottin commented Jan 24, 2018

cuducos commented Jan 24, 2018

rodolfolottin commented Jan 24, 2018

rodolfolottin commented Jan 29, 2018

cuducos commented Jan 29, 2018

rodolfolottin commented Feb 4, 2018 •

edited

Loading

cuducos commented Feb 5, 2018

silviodc commented Feb 5, 2018 •

edited

Loading

rodolfolottin commented Feb 5, 2018 •

edited

Loading

silviodc commented Feb 5, 2018

CauanCabral commented Feb 6, 2018

begnini commented Feb 6, 2018

Attach document image to tweets #6

Attach document image to tweets #6

Comments

Irio commented Jul 3, 2017

cuducos commented Jul 3, 2017

paulocezar commented Oct 5, 2017

silviodc commented Oct 7, 2017

murilobsd commented Oct 19, 2017

silviodc commented Oct 19, 2017 • edited Loading

CauanCabral commented Nov 9, 2017

cuducos commented Nov 9, 2017

rodolfolottin commented Jan 24, 2018

cuducos commented Jan 24, 2018

rodolfolottin commented Jan 24, 2018

rodolfolottin commented Jan 29, 2018

cuducos commented Jan 29, 2018

rodolfolottin commented Feb 4, 2018 • edited Loading

cuducos commented Feb 5, 2018

silviodc commented Feb 5, 2018 • edited Loading

rodolfolottin commented Feb 5, 2018 • edited Loading

silviodc commented Feb 5, 2018

CauanCabral commented Feb 6, 2018

begnini commented Feb 6, 2018

silviodc commented Oct 19, 2017 •

edited

Loading

rodolfolottin commented Feb 4, 2018 •

edited

Loading

silviodc commented Feb 5, 2018 •

edited

Loading

rodolfolottin commented Feb 5, 2018 •

edited

Loading