Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #249 - Add strip to get() and getall() #260

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions parsel/selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,12 +200,15 @@ def re_first(
return el
return default

def getall(self) -> List[str]:
def getall(self, strip: bool = False) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make it keyword-only.

Suggested change
def getall(self, strip: bool = False) -> List[str]:
def getall(self, *, strip: bool = False) -> List[str]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same on get.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this to typing.overload of get as well?

Copy link
Member Author

@felipeboffnunes felipeboffnunes Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding it to get but then this would require some type check on default entry to validate it is a str, either way the behavior looks fine without it for get

"""
Call the ``.get()`` method for each element is this list and return
their results flattened, as a list of strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new option needs to be mentioned in the docstring.

"""
return [x.get() for x in self]
data = [x.get() for x in self]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just pass strip to get() instead of essentially reimplementing the new code from get() here?
(granted, the code is almost trivial, but still)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a Selector, at that point neither default nor strip exist anymore
Only SelectorList recognizes those two fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?
Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a Selector, at that point neither default nor strip exist anymore

Aren’t you also adding strip to Selector? Can’t you use data = [x.get(strip=True) for x in self]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ended up looking harder than it should when I tried.
Also as far as I could see, default also is not on the Selector, so for standardization, I believe adding it would also be required. And in that case I don't understand why it wasn't added up until now, it gives me the feeling it is not a trivial task given the current implementation of get on Selector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are changing Selector.get() to have strip and it already has default so I think there shouldn't be any problems with using it, and it indeed doesn't look too complicated to me :)

Also if you have issues with typing I can hep, I'm working on improving parsel typing right now.

this_doesnt_work = sel.xpath("//ul/li[position()>1]")[0].get(default="6") # Selector doesnt have default field

sel.xpath(...).get(default=...) works in normal parsel without your changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?

No, the purpose of those overloads is to show that the type of the return value depends on whether default is None or not.

Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?

I believe just adding strip to the existing overloads is enough, no new overloads are needed.

Copy link
Member Author

@felipeboffnunes felipeboffnunes Oct 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sel.xpath(...).get(default=...) uses SelectorList, not Selector. The problem is that by indexing [0] then you go to a Selector, and it does not have default .
but either way I still have to give it a look

Also, it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found. I used it on my tests because I saw some of the tests use this kind of indexing to prove Selector behaves correctly, and in this case, I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you are right. Selector.get() is much simpler than SelectorList.get(). Well, I think it makes sense and is easy to add strip= to Selector.get().

it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found

Yeah, it's probably unusual to write <SelectorList>[0].get(). It's OK to have a Selector and call a get) on it though.

I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.

Not so sure about default=, but maybe there are use cases where it's useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this as is, strip for Selector.get() is less useful and can be implemented separately.

if strip:
return [x.strip() if x else x for x in data]
return data

extract = getall

Expand All @@ -217,13 +220,20 @@ def get(self, default: None = None) -> Optional[str]:
def get(self, default: str) -> str:
pass

def get(self, default: Optional[str] = None) -> Optional[str]:
@typing.overload
def get(self, strip: bool) -> str:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? (I am not saying it is not)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is, I just saw the overloads above this line and thought it was required

@typing.overload
def get(self, default: None = None) -> Optional[str]:
    pass

@typing.overload
def get(self, default: str) -> str:
    pass

I removed it on the last push. I don't understand the whole overload completely, but I feel those 2 are also unnecessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is related to typing, when parameter types affect output type, but I do not think it applies here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removing it broke the pipeline for type checks.

No overload variant of "get" of "SelectorList" matches argument type "bool"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you added a new argument to the method you need to add them to all overloads too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And those two overloads are needed because the method either can return None if default wasn't passed or it returns the default instead, so this describes the typing info more clearly.


def get(
self, default: Optional[str] = None, strip: Optional[bool] = False
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
) -> Optional[str]:
"""
Return the result of ``.get()`` for the first element in this list.
If the list is empty, return the default value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new option needs to be mentioned in the docstring

"""
for x in self:
return x.get()
value = x.get()
return value.strip() if strip and value else value
return default

extract_first = get
Expand Down
49 changes: 48 additions & 1 deletion tests/test_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def test_simple_selection(self) -> None:
def test_simple_selection_with_variables(self) -> None:
"""Using XPath variables"""
body = "<p><input name='a' value='1'/><input name='b' value='2'/></p>"
sel = self.sscls(text=body)
sel = self.sscls(text=body) # Selector doesnt have default field
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

self.assertEqual(
[
Expand Down Expand Up @@ -323,6 +323,53 @@ def test_selectorlist_get_alias(self) -> None:
self.assertEqual(sel.xpath("//ul/li").get(), '<li id="1">1</li>')
self.assertEqual(sel.xpath("//ul/li/text()").get(), "1")

def test_selector_get_strip(self) -> None:
body = '<ul><li id="1">1</li><li id="2"> 2 </li><li id="3">3</li></ul>'
sel = self.sscls(text=body)

self.assertEqual(
sel.xpath("//ul/li[position()>1]").get(), '<li id="2"> 2 </li>'
)
self.assertEqual(
sel.xpath("//ul/li[position()>1]").get(strip=True),
'<li id="2"> 2 </li>',
)
self.assertEqual(
sel.xpath("//ul/li[position()>1]/text()").get(), " 2 "
)
self.assertEqual(
sel.xpath("//ul/li[position()>1]/text()").get(strip=True), "2"
)

def test_selector_getall_strip(self) -> None:
body = (
'<ul><li id="1">1</li><li id="2"> 2 </li><li id="3"> 3</li></ul>'
)
sel = self.sscls(text=body)

self.assertEqual(
sel.xpath("//ul/li").getall(),
[
'<li id="1">1</li>',
'<li id="2"> 2 </li>',
'<li id="3"> 3</li>',
],
)
self.assertEqual(
sel.xpath("//ul/li").getall(strip=True),
[
'<li id="1">1</li>',
'<li id="2"> 2 </li>',
'<li id="3"> 3</li>',
],
)
self.assertEqual(
sel.xpath("//ul/li/text()").getall(), ["1", " 2 ", " 3"]
)
self.assertEqual(
sel.xpath("//ul/li/text()").getall(strip=True), ["1", "2", "3"]
)

def test_re_first(self) -> None:
"""Test if re_first() returns first matched element"""
body = '<ul><li id="1">1</li><li id="2">2</li></ul>'
Expand Down