ocrobin

Automatic binarization using deep learning.

This implements a grayscale-to-binary pixel-for-pixel transformation. The models it is usually used with perform some denoising and deblurring, but they are small enough not to contain any significant shape priors. The use of 2D LSTMs in the binarization model allows for some modeling of global noise and intensity properties.

Inference

%pylab inline
rc("image", cmap="gray", interpolation="bicubic")

Populating the interactive namespace from numpy and matplotlib

import ocrobin
bm = ocrobin.Binarizer("bin-000000046-005393.pt")
bm.model

Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(8, 4, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(8, 4, bidirectional=1)
    )
  )
  (4): Conv2d(8, 1, kernel_size=(1, 1), stride=(1, 1))
  (5): Sigmoid()
)

figsize(10, 10)
image = mean(imread("testdata/sample.png")[:, :, :3], 2)
binary = bm.binarize(image)
subplot(121); imshow(image)
subplot(122); imshow(binary)

<matplotlib.image.AxesImage at 0x7fdca6651790>

subplot(121); imshow(image[400:600, 400:600])
subplot(122); imshow(1-binary[400:600, 400:600])

<matplotlib.image.AxesImage at 0x7fdca6576910>

Training

Training data for ocrobin-train is stored in tarfiles, with binary images and corresponding grayscale images.

%%bash
tar -ztvf testdata/bindata.tgz | sed 5q

drwxrwxr-x tmb/tmb           0 2018-04-17 10:27 ./
-rw-rw-r-- tmb/tmb      391766 2018-04-10 09:35 ./A001BIN.bin.png
-rw-rw-r-- tmb/tmb     6021129 2018-04-10 09:35 ./A001BIN.gray.png
-rw-rw-r-- tmb/tmb      226629 2018-04-10 09:36 ./A002BIN.bin.png
-rw-rw-r-- tmb/tmb     2685607 2018-04-10 09:36 ./A002BIN.gray.png


tar: write error

The training data is actually artificially generated; document image degradation for this kind of training works quite well at simulating real data.

from dlinputs import tarrecords
sample = tarrecords.tariterator(open("testdata/bindata.tgz")).next()
sample["__key__"]
subplot(121); imshow(sample["gray.png"])
subplot(122); imshow(sample["bin.png"])

<matplotlib.image.AxesImage at 0x7fdc386ec390>

You can use the ocrobin-train binary to carry out the training.

%%bash
./ocrobin-train -d testdata/bindata.tgz -o temp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ocrobin

Inference

Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

ocrobin

Inference

Training