BUG: Missing spaces in extract_text() method (#1328) #2868

ssjkamei · 2024-09-24T04:17:57Z

This is in response to a pattern where the width of the font does not match the decimal point. The decimal point is a small value, so I don't think it will have much of an impact.
We do not know if rounding is performed when calculating font widths, etc.

I have almost never done pull requests, so please point out if I am wrong.

codecov · 2024-09-24T04:26:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.00%. Comparing base (d974d5c) to head (7597704).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2868   +/-   ##
=======================================
  Coverage   96.00%   96.00%           
=======================================
  Files          51       51           
  Lines        8539     8540    +1     
  Branches     1700     1700           
=======================================
+ Hits         8198     8199    +1     
  Misses        200      200           
  Partials      141      141

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This reverts commit 5400f5a.

pubpub-zz · 2024-09-24T04:57:58Z

Instead of ceil I would prefer looping into using a ratio (95%?) But I wonder if this hasn't been already introduced earlier.

This reverts commit 5400f5a. BUG: Missing spaces in extract_text() method (py-pdf#1328) BUG: Missing spaces in extract_text() method (py-pdf#1328) add test

ssjkamei · 2024-09-24T05:10:18Z

Instead of ceil I would prefer looping into using a ratio (95%?) But I wonder if this hasn't been already introduced earlier.

Thank you.
I will try to create it at 95%. This is only for space judgments and probably has no other effect.

…nt size comparison to ratio

ssjkamei · 2024-09-24T05:33:57Z

I have made the changes, but if I have misunderstood the content, please let me know.

tests/test_text_extraction.py

pypdf/_page.py

Co-authored-by: Stefan <[email protected]>

…n efficiency

…he assertion process

## Version 5.0.1, 2024-09-29 ### New Features (ENH) - Add `full` parameter to PdfWriter constructor (py-pdf#2865) ### Bug Fixes (BUG) - Update pyproject.toml with minimum Python version of 3.8 (py-pdf#2859) - Cope with unbalanced delimiters in dictionary object (py-pdf#2878) - Cope with encoding with too many differences (py-pdf#2873) - Missing spaces in extract_text() method (py-pdf#1328) (py-pdf#2868) - Tolerate truncated files and no warning when jumping startxref (py-pdf#2855) ### Robustness (ROB) - Repair PDF with invalid Root object (py-pdf#2880) - Continue parsing dictionary object when error is detected (py-pdf#2872) - Merge documents with invalid pages in named destinations (py-pdf#2857) - Tolerate comments(%) in arrays (py-pdf#2856) ### Documentation (DOC), Testing (TST), Code Style (STY), Developer Experience (DEV), Maintenance (MAINT) - (py-pdf#2844), (py-pdf#2862), (py-pdf#2863), (py-pdf#2847), (py-pdf#2860), (py-pdf#2867), (py-pdf#2874), (py-pdf#2879) [Full Changelog](py-pdf/pypdf@5.0.0...5.0.1)

## Version 5.0.1, 2024-09-29 ### New Features (ENH) - Add `full` parameter to PdfWriter constructor (#2865) ### Bug Fixes (BUG) - Update pyproject.toml with minimum Python version of 3.8 (#2859) - Cope with unbalanced delimiters in dictionary object (#2878) - Cope with encoding with too many differences (#2873) - Missing spaces in extract_text() method (#1328) (#2868) - Tolerate truncated files and no warning when jumping startxref (#2855) ### Robustness (ROB) - Repair PDF with invalid Root object (#2880) - Continue parsing dictionary object when error is detected (#2872) - Merge documents with invalid pages in named destinations (#2857) - Tolerate comments in arrays (#2856) ### Developer Experience (DEV) - Use latest Python version for benchmarking (#2879) ### Maintenance (MAINT) - Add tests to source distributions (#2874) - Refactor _update_field_annotation (#2862) [Full Changelog](5.0.0...5.0.1)

BUG: Missing spaces in extract_text() method (py-pdf#1328)

5400f5a

ssjkamei added 3 commits September 24, 2024 13:42

Revert "BUG: Missing spaces in extract_text() method (py-pdf#1328)"

aac0436

This reverts commit 5400f5a.

BUG: Missing spaces in extract_text() method (py-pdf#1328)

64b1c92

BUG: Missing spaces in extract_text() method (py-pdf#1328) add test

70e9b38

ssjkamei added 2 commits September 24, 2024 13:59

Revert "BUG: Missing spaces in extract_text() method (py-pdf#1328)"

65224e1

This reverts commit 5400f5a. BUG: Missing spaces in extract_text() method (py-pdf#1328) BUG: Missing spaces in extract_text() method (py-pdf#1328) add test

Merge branch 'main' of https://github.com/ssjkamei/pypdf

788d56d

BUG: Missing spaces in extract_text() method (py-pdf#1328) Convert fo…

f6dcb43

…nt size comparison to ratio

stefan6419846 reviewed Sep 24, 2024

View reviewed changes

tests/test_text_extraction.py Outdated Show resolved Hide resolved

stefan6419846 reviewed Sep 24, 2024

View reviewed changes

tests/test_text_extraction.py Outdated Show resolved Hide resolved

stefan6419846 reviewed Sep 24, 2024

View reviewed changes

pypdf/_page.py Outdated Show resolved Hide resolved

ssjkamei and others added 3 commits September 24, 2024 18:39

Correction to new file URL.

fd1c489

Co-authored-by: Stefan <[email protected]>

BUG: Missing spaces in extract_text() method (py-pdf#1328) calculatio…

2873b9e

…n efficiency

BUG: Missing spaces in extract_text() method (py-pdf#1328) Simplify t…

7597704

…he assertion process

pubpub-zz approved these changes Sep 24, 2024

View reviewed changes

pubpub-zz merged commit 635a7c1 into py-pdf:main Sep 24, 2024
16 checks passed

pubpub-zz mentioned this pull request Sep 29, 2024

REL: 5.0.1 #2884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Missing spaces in extract_text() method (#1328) #2868

BUG: Missing spaces in extract_text() method (#1328) #2868

ssjkamei commented Sep 24, 2024

codecov bot commented Sep 24, 2024 •

edited

Loading

pubpub-zz commented Sep 24, 2024

ssjkamei commented Sep 24, 2024

ssjkamei commented Sep 24, 2024

BUG: Missing spaces in extract_text() method (#1328) #2868

BUG: Missing spaces in extract_text() method (#1328) #2868

Conversation

ssjkamei commented Sep 24, 2024

codecov bot commented Sep 24, 2024 • edited Loading

Codecov Report

pubpub-zz commented Sep 24, 2024

ssjkamei commented Sep 24, 2024

ssjkamei commented Sep 24, 2024

codecov bot commented Sep 24, 2024 •

edited

Loading