Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Differentiate actual paragraphs and LUTE's page splitting #475

Open
lef-est opened this issue Aug 30, 2024 · 1 comment

Comments

@lef-est
Copy link

lef-est commented Aug 30, 2024

Is your feature request related to a problem? Please describe.

Where to break a parapraph is a stylistic choice by the author and may aid in storytelling. This information is lost because LUTE splits paragraphs to meet the token count per page.

Describe the solution you'd like

Indent original paragraphs, while leaving the "new paragphs" resutled from LUTE's splitting unchanged the same way they are treated now.
If possible, I'd like LUTE to not remove empty lines (or all whitespaces) for the same reason. Many authors use 2 or 3 lines for sub-chapter breaks. Some even differentiate the use of 2 lines or 3 lines.

Describe alternatives you've considered

During book creation, give users the option of "don't split paragraphs" (unchecked by default). Not the best solution performance-wise and probably takes more work.

Additional context

With LUTE we already lose a lot of information such as bold, italic, underline, images, tables. I understand the difficulty of incorporating them, which entails fundamental changes. But paragraph breaks and empty lines are probably the easiest ones to handle and deserve a chance.

@lef-est lef-est changed the title Differentiate actual paragraphs and LUTE's page splitting [Feature Request] Differentiate actual paragraphs and LUTE's page splitting Aug 30, 2024
@jzohrab
Copy link
Collaborator

jzohrab commented Aug 30, 2024

Lute splits by sentence to keep the token count reasonable: some paragraphs can get really long, or the formatting can get weird when people copy/paste text from different places. It's hard to say what the best solution is here, I still feel that splitting paragraphs up is necessary, and I don't have a great answer for this ... unless I do something like first try to group by paragraphs, and then check the page size and only split the paragraph if the page is, say, 50% longer than the maximum size. It's a bit convoluted and tricky, but maybe that would suffice. Does that seem reasonable?

If possible, I'd like LUTE to not remove empty lines (or all whitespaces) for the same reason.

Yes this sounds reasonable and I'm not sure why I did this in the first place. :-)

bold, italic, underline, images, tables.

Yeah that's so tough! Would be nice though, I agree. There's an issue to allow for markdown on import of text files (though tables still wouldn't be possible, b/c it's bananas)

Let me know re the paragraph grouping and page size threshold idea ... it could be tricky-ish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants