Skip to content

Latest commit

 

History

History
128 lines (70 loc) · 2.62 KB

README.md

File metadata and controls

128 lines (70 loc) · 2.62 KB

ocrobin

Automatic binarization using deep learning.

This implements a grayscale-to-binary pixel-for-pixel transformation. The models it is usually used with perform some denoising and deblurring, but they are small enough not to contain any significant shape priors. The use of 2D LSTMs in the binarization model allows for some modeling of global noise and intensity properties.

Inference

%pylab inline
rc("image", cmap="gray", interpolation="bicubic")
Populating the interactive namespace from numpy and matplotlib
import ocrobin
bm = ocrobin.Binarizer("bin-000000046-005393.pt")
bm.model
Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(8, 4, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(8, 4, bidirectional=1)
    )
  )
  (4): Conv2d(8, 1, kernel_size=(1, 1), stride=(1, 1))
  (5): Sigmoid()
)
figsize(10, 10)
image = mean(imread("testdata/sample.png")[:, :, :3], 2)
binary = bm.binarize(image)
subplot(121); imshow(image)
subplot(122); imshow(binary)
<matplotlib.image.AxesImage at 0x7fdca6651790>

png

subplot(121); imshow(image[400:600, 400:600])
subplot(122); imshow(1-binary[400:600, 400:600])
<matplotlib.image.AxesImage at 0x7fdca6576910>

png

Training

Training data for ocrobin-train is stored in tarfiles, with binary images and corresponding grayscale images.

%%bash
tar -ztvf testdata/bindata.tgz | sed 5q
drwxrwxr-x tmb/tmb           0 2018-04-17 10:27 ./
-rw-rw-r-- tmb/tmb      391766 2018-04-10 09:35 ./A001BIN.bin.png
-rw-rw-r-- tmb/tmb     6021129 2018-04-10 09:35 ./A001BIN.gray.png
-rw-rw-r-- tmb/tmb      226629 2018-04-10 09:36 ./A002BIN.bin.png
-rw-rw-r-- tmb/tmb     2685607 2018-04-10 09:36 ./A002BIN.gray.png


tar: write error

The training data is actually artificially generated; document image degradation for this kind of training works quite well at simulating real data.

from dlinputs import tarrecords
sample = tarrecords.tariterator(open("testdata/bindata.tgz")).next()
sample["__key__"]
subplot(121); imshow(sample["gray.png"])
subplot(122); imshow(sample["bin.png"])
<matplotlib.image.AxesImage at 0x7fdc386ec390>

You can use the ocrobin-train binary to carry out the training.

%%bash
./ocrobin-train -d testdata/bindata.tgz -o temp