Skip to content
This repository has been archived by the owner on Sep 18, 2018. It is now read-only.

Attach document image to tweet #29

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d8c8e46
Update Dockerfile with imagemagick packages
rodolfolottin Feb 3, 2018
d082972
Rename text() method and create a method to get all the tweet data
rodolfolottin Feb 19, 2018
5ad5a34
Implements method to build the camara image url
rodolfolottin Feb 19, 2018
1b054ae
Add wand in requirements.txt
rodolfolottin Feb 24, 2018
7822531
Update Dockerfile with opencv and ghostscript
rodolfolottin Mar 14, 2018
768c1bc
Crop a given png file in order to remove its white areas
rodolfolottin Mar 14, 2018
653275f
Add image pdf and png to fixtures
rodolfolottin Mar 15, 2018
3bccd45
Change camara url
rodolfolottin Apr 8, 2018
eb0d8cd
Add tweet_image call in publish method
rodolfolottin Apr 8, 2018
e5a53dc
Implement method to retrieve the reimbursement pdf and call the crop …
rodolfolottin Apr 8, 2018
f0b16cb
Add the image to tweet_data
rodolfolottin Apr 8, 2018
1f5fd5f
Resolve merge conflicts
rodolfolottin Apr 8, 2018
38c2b5d
Add opencv to requirements.txt file
rodolfolottin Apr 8, 2018
32be9f4
Sort imports descending
rodolfolottin Apr 15, 2018
5f0cae7
Remove unused var
rodolfolottin Apr 15, 2018
b0dc505
Specify HTTPError except when trying to retrieve the reimbursement image
rodolfolottin Apr 15, 2018
c841e32
Start using anaconda docker image instead of the simple alpine python…
rodolfolottin Apr 22, 2018
e88e8f2
Add apt-get command to install libmagickwand-dev
rodolfolottin Apr 23, 2018
fa7a805
Change adduser parameter to add a system user
rodolfolottin Apr 23, 2018
622268a
Remove code from try block
rodolfolottin Apr 23, 2018
60d87be
Remove unused fixtures
rodolfolottin Apr 28, 2018
23f997b
Refactor tweet_image and tests
rodolfolottin Apr 28, 2018
2be5a44
Update Dockerfile with ghostscript related packages
rodolfolottin Apr 28, 2018
0c065d8
Update Dockerfile with imagemagick package
rodolfolottin Apr 28, 2018
3c4d8e0
Add apt-get update to Dockerfile script
rodolfolottin May 4, 2018
1a5d446
Update Dockerfile and travis.yml with Wand required packages
rodolfolottin May 5, 2018
0fccf2e
Update .travis.yml dist and scripts
rodolfolottin May 5, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ FROM python:3.6.3-alpine
RUN apk add --no-cache --virtual build-base \
&& apk add --no-cache --virtual libxml2-dev \
&& apk add --no-cache --virtual libxslt-dev \
&& apk add --no-cache --virtual imagemagick \
&& apk add --no-cache --virtual imagemagick-dev \
&& apk add --no-cache --virtual ghostscript \
&& mkdir -p /usr/include/libxml \
&& ln -s /usr/include/libxml2/libxml/xmlexports.h /usr/include/libxml/xmlexports.h \
&& ln -s /usr/include/libxml2/libxml/xmlversion.h /usr/include/libxml/xmlversion.h
Expand All @@ -11,6 +14,65 @@ RUN apk add --no-cache --virtual build-base \
# COPY rosie/config.ini.example ./config.ini
# COPY rosie/requirements.txt ./rosie
# RUN pip install -r rosie/requirements.txt
RUN echo -e '@edgunity http://nl.alpinelinux.org/alpine/edge/community\n\
@edge http://nl.alpinelinux.org/alpine/edge/main\n\
@testing http://nl.alpinelinux.org/alpine/edge/testing\n\
@community http://dl-cdn.alpinelinux.org/alpine/edge/community'\
>> /etc/apk/repositories

RUN apk add --no-cache --virtual openblas-dev \
&& apk add --no-cache --virtual unzip \
&& apk add --no-cache --virtual wget \
&& apk add --no-cache --virtual cmake \
&& apk add --no-cache --virtual libtbb@testing \
&& apk add --no-cache --virtual libtbb-dev@testing \
&& apk add --no-cache --virtual libjpeg \
&& apk add --no-cache --virtual libjpeg-turbo-dev \
&& apk add --no-cache --virtual libpng-dev \
&& apk add --no-cache --virtual jasper-dev \
&& apk add --no-cache --virtual tiff-dev \
&& apk add --no-cache --virtual libwebp-dev \
&& apk add --no-cache --virtual clang-dev \
&& apk add --no-cache --virtual linux-headers \
&& apk add --no-cache --virtual clang \
&& pip install numpy

ENV CC /usr/bin/clang
ENV CXX /usr/bin/clang++

ENV OPENCV_VERSION=3.1.0

RUN mkdir /opt && cd /opt && \
wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip && \
unzip ${OPENCV_VERSION}.zip && \
rm -rf ${OPENCV_VERSION}.zip

RUN mkdir -p /opt/opencv-${OPENCV_VERSION}/build && \
cd /opt/opencv-${OPENCV_VERSION}/build && \
cmake \
-D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D WITH_FFMPEG=NO \
-D WITH_IPP=NO \
-D WITH_OPENEXR=NO \
-D WITH_TBB=YES \
-D BUILD_EXAMPLES=NO \
-D BUILD_ANDROID_EXAMPLES=NO \
-D INSTALL_PYTHON_EXAMPLES=NO \
-D BUILD_DOCS=NO \
-D BUILD_opencv_python2=NO \
-D BUILD_opencv_python3=ON \
-D PYTHON3_EXECUTABLE=/usr/local/bin/python \
-D PYTHON3_INCLUDE_DIR=/usr/local/include/python3.6m/ \
-D PYTHON3_LIBRARY=/usr/local/lib/libpython3.so \
-D PYTHON_LIBRARY=/usr/local/lib/libpython3.so \
-D PYTHON3_PACKAGES_PATH=/usr/local/lib/python3.6/site-packages/ \
-D PYTHON3_NUMPY_INCLUDE_DIRS=/usr/local/lib/python3.6/site-packages/numpy/core/include/ \
.. && \
make VERBOSE=1 && \
make && \
make install && \
rm -rf /opt/opencv-${OPENCV_VERSION}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this a lot of code just to install OpenCV: I see two alternatives here.

The first one is to avoid adding too many layers (too many separated RUNcommands). This makes the image grows big and makes it difficult to maintain (after a couple of weeks/months people might wonder which lines to remove if we don't need openCV anymore). This solution might be as simple as:

RUN apk add --no-cache --virtual build-base \
    && apk add --no-cache --virtual libxml2-dev \
    && apk add --no-cache --virtual libxslt-dev \
    && apk add --no-cache --virtual imagemagick \
    && apk add --no-cache --virtual imagemagick-dev \
    && apk add --no-cache --virtual ghostscript \
    && mkdir -p /usr/include/libxml \
    && ln -s /usr/include/libxml2/libxml/xmlexports.h /usr/include/libxml/xmlexports.h \
    && ln -s /usr/include/libxml2/libxml/xmlversion.h /usr/include/libxml/xmlversion.h

ENV CC /usr/bin/clang
ENV CXX /usr/bin/clang++
ENV OPENCV_VERSION=3.1.0
RUN <one single run command with all required commands chained>

…

Another alternative is to use an image that already has Python 3.6, OpenCV and the data science kit (NumPy, Pandas, scikit-learn) instead of the simple python:3.6.3-alpine. Not sure if it exists though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


WORKDIR /usr/src/app
COPY requirements.txt ./
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
amqp==2.2.2
celery==4.1.0
ipdb==0.10.3
opencv-python==3.4.0.12
pandas==0.21.0
pymongo==3.5.1
python-twitter==3.3
requests==2.18.4
serenata-toolbox==12.2.2
wand==0.4.4
Binary file added tests/fixtures/10.pdf
Binary file not shown.
Binary file added tests/fixtures/10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 36 additions & 4 deletions tests/targets/test_twitter.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import datetime
from unittest import TestCase, mock
from io import BytesIO, BufferedReader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of style, would you mind putting sorting this? I mean, from io… coming before from unittest….


import pandas as pd
from twitter import TwitterError
Expand Down Expand Up @@ -108,6 +109,8 @@ def setUp(self):
self.reimbursement = {
'congressperson_name': 'Eduardo Cunha',
'document_id': 10,
'applicant_id': 10,
'year': 2015,
'state': 'RJ',
'twitter_profile': 'DepEduardoCunha',
}
Expand All @@ -117,19 +120,48 @@ def setUp(self):

def test_publish(self):
self.subject.publish()
self.api.PostUpdate.assert_called_once_with(self.subject.text())
text, reimbursement_image = self.subject.tweet_data()
self.api.PostUpdate.assert_called_once_with(
media=reimbursement_image, status=text)
dict_representation = dict(self.subject)
self.database.posts.insert_one.assert_called_once_with(
dict_representation)

def test_text(self):
def test_tweet_data(self):
message = (
'🚨Gasto suspeito de Dep. @DepEduardoCunha (RJ). '
'Você pode me ajudar a verificar? '
'https://jarbas.serenata.ai/layers/#/documentId/10 '
'#SerenataDeAmor na @CamaraDeputados'
)
self.assertEqual(message, self.subject.text())
reimbursement_image = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could see you defined reimbursement_image but never uses it. Is that correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mistake, I'll remove it.

I had some doubts about how to test the methods that call the tweet_image method. What do you think? Should I test for the success case (a properly file-like object is returned) in this method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mock the tweet_image and assert that it was called:

@patch.object(Post, 'tweet_image')
def test_something_else(self, tweet_image_mock):
    # do magic
    tweet_image_mock.assert_called_once_with()

self.assertEqual(
(message, None), self.subject.tweet_data())
self.reimbursement['twitter_profile'] = None
with self.assertRaises(ValueError):
self.subject.text()
self.subject.tweet_data()

def test_tweet_text(self):
message = (
'🚨Gasto suspeito de Dep. @DepEduardoCunha (RJ). '
'Você pode me ajudar a verificar? '
'https://jarbas.serenata.ai/layers/#/documentId/10 '
'#SerenataDeAmor na @CamaraDeputados'
)
self.assertEqual(message, self.subject.tweet_text())

def test_camara_image_url(self):
url = 'http://www.camara.gov.br/cota-parlamentar/documentos/publ/10/2015/10.pdf'
self.assertEqual(url, self.subject.camara_image_url())

@mock.patch('whistleblower.targets.twitter.urllib.request.urlopen')
def test_tweet_image(self, urlopen_mock):
with open('tests/fixtures/10.pdf', 'rb') as pdf_fixture:
mock_response = pdf_fixture
mock_response_read = BytesIO(pdf_fixture.read())
urlopen_mock.return_value = mock_response_read
self.assertIsInstance(
self.subject.tweet_image(), BufferedReader)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that this block could be simpler:

with open('tests/fixtures/10.pdf', 'rb') as pdf_fixture:
    mock_response = pdf_fixture
    mock_response_read = BytesIO(pdf_fixture.read())
    urlopen_mock.return_value = mock_response_read
self.assertIsInstance(self.subject.tweet_image(), BufferedReader)

You create pdf_fixture, something that already has .read() method, to rename it and add it to a BytesIO to read it. What about this, does it looks like it's gonna work?

with open('tests/fixtures/10.pdf', 'rb') as mock_response:
   urlopen_mock.return_value = mock_response
   self.assertIsInstance(self.subject.tweet_image(), BufferedReader)

Copy link
Author

@rodolfolottin rodolfolottin Apr 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this error when trying to just assign this variable to the urlopen_mock.return_value:

...Exception ignored in: <bound method Resource.__del__ of <wand.image.Image: (empty)>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/wand/resource.py", line 232, in __del__
    self.destroy()
  File "/usr/local/lib/python3.6/site-packages/wand/image.py", line 2767, in destroy
    for i in range(0, len(self.sequence)):
TypeError: object of type 'NoneType' has no len()
F..........
======================================================================
FAIL: test_tweet_image (targets.test_twitter.TestPost)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/unittest/mock.py", line 1179, in patched
    return func(*args, **keywargs)
  File "/usr/src/app/tests/targets/test_twitter.py", line 163, in test_tweet_image
    self.assertIsInstance(self.subject.tweet_image(), BufferedReader)
AssertionError: None is not an instance of <class '_io.BufferedReader'>

----------------------------------------------------------------------

When debugging this code I came to the conclusion that in this way the urlopen isn't correctly mocked.

I could simplify the test writing it like down below, but I would still have to read the content of the opened file file and cast it to a BytesIO instance. What do you think about it?

@mock.patch('whistleblower.targets.twitter.urllib.request.urlopen')
def test_tweet_image(self, urlopen_mock):
    with open('tests/fixtures/10.pdf', 'rb') as mock_response:
        urlopen_mock.return_value = BytesIO(mock_response.read())
        self.assertIsInstance(self.subject.tweet_image(), BufferedReader)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great : ) Way better!


urlopen_mock.side_effect = Exception()
self.assertIsNone(self.subject.tweet_image())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a new test. The first assert would be test_tweet_image_success and this one test_tweet_image_error, for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I did in this way because the other tests have both the success and error cases in the same method.

92 changes: 92 additions & 0 deletions whistleblower/helpers/crop.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import sys

import cv2
import numpy

TEXT_MIN_WIDTH = 35
TEXT_MIN_HEIGHT = 10

DEFAULT_WIDTH = 850
DEFAULT_HEIGHT = 1100

KERNEL_WIDTH = 25
KERNEL_HEIGHT = 15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the meaning of these constants?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, these are the distance (in pixels) between letters in the document to be considered the same line. They are used in the mathematical operations which glue the letters of the same line together. I'll add these comments.



def remove_borders(image, threshold, max_width, max_height):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding doctrings to explain a bit more about these arguments?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is mine. I'll improve them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, @begnini : )

height, width = image.shape[:2]

for i in range(max_width):
total = image[:, i].sum() / 255
if total > threshold:
image[:, i] = numpy.ones(height) * 255

total = image[:, width - i - 1].sum() / 255
if total > threshold:
image[:, i - 1] = numpy.ones(height) * 255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could extract this logic into a helper function _remove_border(img, size, max_size, threshold) in the DRY philosophy. So remove_borders could be simpler and we could document further what happens in _remove_border:

def remove_borders(image, threshold, max_width, max_height):
    height. width, *_ = image.shape
    image = _remove_border(image, width, max_width, threshold)
    image = _remove_border(image, height, max_height, threshold)
    return image

And in _remove_border you could document the rational underneath it ; )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexes is different. The first loop look at columns, the other looks at lines. To break in one function, the code should rotate the image and this will be a lot weird. I can refactor to create a remove_column_borders and remove_line_borders ou something like that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True story! I missed that, my bad. Just ignore my comment then : )


for i in range(max_height):
total = image[i, :].sum() / 255
if total > threshold:
image[i, :] = numpy.ones(width) * 255

total = image[height - i - 1, :].sum()
if total > threshold:
image[height - i - 1, :] = numpy.ones(width) * 255

return image


def crop(numpy_array, filename):
image = cv2.imdecode(numpy_array, cv2.IMREAD_COLOR)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

gray = remove_borders(gray, 0.8, 15, 15)

adjusted_width = image.shape[1] / DEFAULT_WIDTH
adjusted_height = image.shape[0] / DEFAULT_HEIGHT

kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (KERNEL_WIDTH, KERNEL_HEIGHT))
eroded = cv2.erode(gray, kernel)

_, bw = cv2.threshold(eroded, 127, 255, cv2.THRESH_BINARY_INV)

total, markers = cv2.connectedComponents(bw)

images = [numpy.uint8(markers==i) * 255 for i in range(total) if numpy.uint8(markers==i).sum() > 10]

rectangles = []

for label in images:
countours = cv2.findContours(label, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

(x,y,w,h) = cv2.boundingRect(countours[0])

rectangles.append((x, y, w, h, label.sum() / 255.0))

rectangles = sorted(rectangles, key=lambda x:x[4], reverse=True)

rectangles = rectangles[1:]

expanded = [sys.maxsize, sys.maxsize, -sys.maxsize, -sys.maxsize]

for rect in rectangles:

x0, y0, w0, h0 = expanded
x1, y1, w1, h1, _ = rect

if w1 <= (TEXT_MIN_WIDTH * adjusted_width):
continue

if h1 <= (TEXT_MIN_HEIGHT * adjusted_height):
continue

x = min(x0, x1)
y = min(y0, y1)

w = max(x0 + w0, x1 + w1) - x
h = max(y0 + h0, y1 + h1) - y

expanded = [x, y, w, h]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly from line 50 and on you started to always use a blank line between instructions. I think a better use of blank likes could enhance the readability of this code making it more meaningful ; )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are grouped by similarity . For example, computing width and height, are always grouped, x0 and y0 too (is the same computing in different variables). I think if we add more blank lines will loose a little of reading.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean adding more spaces, but removing them : )

For example, grouping by similarity isn't clear in this snnipet:

rectangles = sorted(rectangles, key=lambda x:x[4], reverse=True)
 
rectangles = rectangles[1:]

They are defining the same variable, so I expected these lines to be grouped, not separated.

Anyway: those are one of the minor comments. Feel free to ignore them anyway and altogether : )


cv2.imwrite(filename, image[y:y+h, x:x+w])
71 changes: 60 additions & 11 deletions whistleblower/targets/twitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@
import os
import re
import urllib.request
from tempfile import NamedTemporaryFile

import numpy as np
import pandas as pd
from pymongo import MongoClient
import twitter
from wand.image import Image

from whistleblower.suspicions import Suspicions
from whistleblower.helpers.crop import crop

ACCESS_TOKEN_KEY = os.environ['TWITTER_ACCESS_TOKEN_KEY']
ACCESS_TOKEN_SECRET = os.environ['TWITTER_ACCESS_TOKEN_SECRET']
Expand Down Expand Up @@ -138,27 +141,73 @@ def __iter__(self):
yield 'text', self.status.text
yield 'document_id', self.reimbursement['document_id']

def text(self):
def tweet_data(self):
"""
Proper tweet message for the given reimbursement.
Proper tweet data for the given reimbursement.
"""
profile = self.reimbursement['twitter_profile']
if profile:
link = 'https://jarbas.serenata.ai/layers/#/documentId/{}'.format(
self.reimbursement['document_id'])
message = (
'🚨Gasto suspeito de Dep. @{} ({}). '
'Você pode me ajudar a verificar? '
'{} #SerenataDeAmor na @CamaraDeputados'
).format(profile, self.reimbursement['state'], link)
return message
return self.tweet_text(), self.tweet_image()
else:
raise ValueError(
'Congressperson does not have a registered Twitter account.')


def tweet_text(self):
link = 'https://jarbas.serenata.ai/layers/#/documentId/{}'.format(
self.reimbursement['document_id'])
message = (
'🚨Gasto suspeito de Dep. @{} ({}). '
'Você pode me ajudar a verificar? '
'{} #SerenataDeAmor na @CamaraDeputados'
).format(
self.reimbursement['twitter_profile'],
self.reimbursement['state'],
link
)
return message

def camara_image_url(self):
"""
Proper image url for the given reimbursement.
"""
url = (
'http://www.camara.gov.br/cota-parlamentar/documentos/publ/'
'{}/{}/{}.pdf'.format(
self.reimbursement['applicant_id'],
self.reimbursement['year'],
self.reimbursement['document_id'])
)

return url

def tweet_image(self):
"""
Download, crop and open the image for the given reimbursement.
"""
try:
response = urllib.request.urlopen(self.camara_image_url())

image_bin = Image(file=response).make_blob('png')
numpy_array = np.frombuffer(image_bin, np.uint8)

with NamedTemporaryFile(suffix='.png') as temp:
crop(numpy_array, temp.name)

with open(temp.name, 'rb') as cropped_file:
cropped_image = cropped_file
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except: is not recommended because it blurs different erros that might occur. Long blocks of code in the try: is also not recommended for the same reason.

Can you refactor these lines (188-202) using shorter blocks of code (usually a single line, but not strictly necessary) in the try blocks and specifying the errors expected in the opening of except block?

return None

return cropped_image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code work? I mean, when I exit a context manager created with NamedTemporaryFile I expect the file do be deleted. It might work because the cropped_image is still in memory, but I think this design is not neat. Does the `crop function takes a BytesIO instead? This method could return an image as bytes, for example.

Alternatively you could write your own context manager to have more control and to be more explicit about the temporary file deletion:

from contextlib import contextmanagerclass Tweet:

    …

    @contextmanager
    def receipt_pdf_as_png(self, pdf_response):
        image_bin = Image(file= pdf_response).make_blob('png')
        numpy_array = np.frombuffer(image_bin, np.uint8)

        # write PNG to a temp file
        temp = NamedTemporaryFile(delete=False, suffix='.png')
        with open(temp.name, 'wb') as fobj:
            crop(numpy_array, temp.name)

        # return PNG as a file object
        with open(temp.name, 'rb') as fobj:
            yield fobj

        # delete temporary file
        os.remove(temp.name)

     def tweet_image(self):
         try:
             response = urllib.request.urlopen(self.camara_image_url())
         except urllib.error.HTTPError:
             return None

        with self.receipt_pdf_as_png(response) as image:
             return image


def publish(self):
"""
Post the update to Twitter's timeline.
"""
self.status = self.api.PostUpdate(self.text())
text, reimbursement_image = self.tweet_data()

self.status = self.api.PostUpdate(
status=text,
media=reimbursement_image)
self.database.posts.insert_one(dict(self))