-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text-line detection & extracion [segmenting pages into lines] #18
Comments
|
We are still hoping that you release a text line detection and extraction implementation in calamari in the future. |
@ChWick Is there any tool available at GitHub that you recommend using for training text-line detection and extraction? |
@mrocr Maybe the line segmenter of T. Breuel https://github.com/NVlabs/ocroseg is what you are looking for. |
The developer of Kraken has noted to me that ocroseg doesn't converge, even when using the same uw3 dataset that T. Breuel used. |
Hey there @ChWick. The README states that you intend to eventually move the ocropy line segmentation step into calamari. Is there a branch where this work is in progress that I could potentially build off of? |
@madisonmay You can also use kraken's segmentation which handily produces JSON of bounding boxes (https://github.com/mittagessen/kraken/blob/master/docs/advanced.rst#page-segmentation-and-script-detection). Ocropy has two PR (ocropus-archive/DUP-ocropy#281 and ocropus-archive/DUP-ocropy#283) which also provided such coordinates JSON. |
@ChWick Thank you for your hard work.
Any updates regarding your progress in implementing your Page Segmentation FCN network to Calamari?
Will you include CPU mode for training & prediction?
The text was updated successfully, but these errors were encountered: