Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Missing spaces in extract_text() method (#1328) #2868

Merged
merged 10 commits into from
Sep 24, 2024

Conversation

ssjkamei
Copy link
Contributor

Close #1328.

This is in response to a pattern where the width of the font does not match the decimal point. The decimal point is a small value, so I don't think it will have much of an impact.
We do not know if rounding is performed when calculating font widths, etc.

I have almost never done pull requests, so please point out if I am wrong.

Copy link

codecov bot commented Sep 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.00%. Comparing base (d974d5c) to head (7597704).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2868   +/-   ##
=======================================
  Coverage   96.00%   96.00%           
=======================================
  Files          51       51           
  Lines        8539     8540    +1     
  Branches     1700     1700           
=======================================
+ Hits         8198     8199    +1     
  Misses        200      200           
  Partials      141      141           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pubpub-zz
Copy link
Collaborator

Instead of ceil I would prefer looping into using a ratio (95%?) But I wonder if this hasn't been already introduced earlier.

This reverts commit 5400f5a.

BUG: Missing spaces in extract_text() method (py-pdf#1328)

BUG: Missing spaces in extract_text() method (py-pdf#1328) add test
@ssjkamei
Copy link
Contributor Author

Instead of ceil I would prefer looping into using a ratio (95%?) But I wonder if this hasn't been already introduced earlier.

Thank you.
I will try to create it at 95%. This is only for space judgments and probably has no other effect.

@ssjkamei
Copy link
Contributor Author

I have made the changes, but if I have misunderstood the content, please let me know.

pypdf/_page.py Outdated Show resolved Hide resolved
@pubpub-zz pubpub-zz merged commit 635a7c1 into py-pdf:main Sep 24, 2024
16 checks passed
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this pull request Sep 29, 2024
## Version 5.0.1, 2024-09-29

### New Features (ENH)
- Add `full` parameter to PdfWriter constructor (py-pdf#2865)

### Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (py-pdf#2859)
- Cope with unbalanced delimiters in dictionary object (py-pdf#2878)
- Cope with encoding with too many differences (py-pdf#2873)
- Missing spaces in extract_text() method (py-pdf#1328) (py-pdf#2868)
- Tolerate truncated files and no warning when jumping startxref (py-pdf#2855)

### Robustness (ROB)
- Repair PDF with invalid Root object (py-pdf#2880)
- Continue parsing dictionary object when error is detected (py-pdf#2872)
- Merge documents with invalid pages in named destinations (py-pdf#2857)
- Tolerate comments(%) in arrays (py-pdf#2856)

### Documentation (DOC), Testing (TST), Code Style (STY), Developer Experience (DEV), Maintenance (MAINT)

- (py-pdf#2844), (py-pdf#2862), (py-pdf#2863), (py-pdf#2847), (py-pdf#2860), (py-pdf#2867), (py-pdf#2874), (py-pdf#2879)

[Full Changelog](py-pdf/pypdf@5.0.0...5.0.1)
@pubpub-zz pubpub-zz mentioned this pull request Sep 29, 2024
pubpub-zz added a commit that referenced this pull request Sep 29, 2024
## Version 5.0.1, 2024-09-29

### New Features (ENH)
- Add `full` parameter to PdfWriter constructor (#2865)

### Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (#2859)
- Cope with unbalanced delimiters in dictionary object (#2878)
- Cope with encoding with too many differences (#2873)
- Missing spaces in extract_text() method (#1328) (#2868)
- Tolerate truncated files and no warning when jumping startxref (#2855)

### Robustness (ROB)
- Repair PDF with invalid Root object (#2880)
- Continue parsing dictionary object when error is detected (#2872)
- Merge documents with invalid pages in named destinations (#2857)
- Tolerate comments in arrays (#2856)

### Developer Experience (DEV)
- Use latest Python version for benchmarking (#2879)

### Maintenance (MAINT)
- Add tests to source distributions (#2874)
- Refactor _update_field_annotation (#2862)

[Full Changelog](5.0.0...5.0.1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing spaces in extract_text() method
3 participants