Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: New LZW decoding implementation #2887

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

MartinThoma
Copy link
Member

@MartinThoma MartinThoma commented Sep 30, 2024

The basis for this implementation is https://github.com/empira/PDFsharp/blob/master/src/foundation/src/PDFsharp/src/PdfSharp/Pdf.Filters/LzwDecode.cs (MIT licensed)

As this removes the LZWDecode class from a public module, we have to do a major release when we release this change.

Copy link

codecov bot commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 97.14286% with 2 lines in your changes missing coverage. Please review.

Project coverage is 96.21%. Comparing base (8e1799e) to head (7c4df04).

Files with missing lines Patch % Lines
pypdf/_codecs/_codecs.py 98.50% 0 Missing and 1 partial ⚠️
pypdf/filters.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2887      +/-   ##
==========================================
- Coverage   96.27%   96.21%   -0.06%     
==========================================
  Files          52       52              
  Lines        8689     8703      +14     
  Branches     1733     1733              
==========================================
+ Hits         8365     8374       +9     
- Misses        187      192       +5     
  Partials      137      137              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -365,131 +364,6 @@ def decode(
return b"".join(lst)


class LZWDecode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK our deprecation policy requires to keep this compatibility layer until we release version 7.0.

Copy link
Member Author

@MartinThoma MartinThoma Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the class back, but the implementation now is the new _codecs.py one :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have just removed some outdated comments. I guess we are going to keep the old class for now without any deprecation process?

@@ -41,12 +41,12 @@
from io import BytesIO
from typing import Any, Dict, List, Optional, Tuple, Union, cast

from ._codecs._codecs import LzwCodec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indirectly exposes both the internal and the external implementation as a public module, including the encoder. Do we really want to do this? I tend to either make this import local or private by adding as _LzwCodec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants