Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text-line detection & extracion [segmenting pages into lines] #18

Closed
ghost opened this issue Sep 18, 2018 · 7 comments
Closed

Text-line detection & extracion [segmenting pages into lines] #18

ghost opened this issue Sep 18, 2018 · 7 comments

Comments

@ghost
Copy link

ghost commented Sep 18, 2018

@ChWick Thank you for your hard work.

  • Any updates regarding your progress in implementing your Page Segmentation FCN network to Calamari?

  • Will you include CPU mode for training & prediction?

@ChWick
Copy link
Member

ChWick commented Sep 20, 2018

@mrocr

  • Unfortunately there are no updates on the FCN, yet
  • CPU will be used by default for training and prediction if no valid GPU is available

@ghost ghost closed this as completed Sep 20, 2018
@ghost
Copy link
Author

ghost commented Sep 20, 2018

We are still hoping that you release a text line detection and extraction implementation in calamari in the future.
Thank you.

@ghost
Copy link
Author

ghost commented Sep 20, 2018

@ChWick Is there any tool available at GitHub that you recommend using for training text-line detection and extraction?

@ChWick
Copy link
Member

ChWick commented Sep 25, 2018

@mrocr Maybe the line segmenter of T. Breuel https://github.com/NVlabs/ocroseg is what you are looking for.

@ghost
Copy link
Author

ghost commented Sep 25, 2018

The developer of Kraken has noted to me that ocroseg doesn't converge, even when using the same uw3 dataset that T. Breuel used.
Source of conversation: mittagessen/seg#4

@madisonmay
Copy link

Hey there @ChWick. The README states that you intend to eventually move the ocropy line segmentation step into calamari. Is there a branch where this work is in progress that I could potentially build off of?

@kba
Copy link
Contributor

kba commented Jan 31, 2019

@madisonmay You can also use kraken's segmentation which handily produces JSON of bounding boxes (https://github.com/mittagessen/kraken/blob/master/docs/advanced.rst#page-segmentation-and-script-detection). Ocropy has two PR (ocropus-archive/DUP-ocropy#281 and ocropus-archive/DUP-ocropy#283) which also provided such coordinates JSON.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants