-
-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some older feeds showing up after a cache clear #124
Comments
Examples include: [http://www.sdjournal.com/archives/categories/languages/python/rss.xml] [http://online.effbot.org/rss.xml] [http://www.artima.com/weblogs/feeds/bloggers/micheles.rss] |
We have to create a script to validate each feed url against w3.rssvalidator, So we can remove the invalid feeds. Also we need to store the email address of feed responsible to notify about those issues. What do you think? I can write a script to validate each field and remove invalid from config.ini |
Please do run up a validator script; that would be great. I think if you
were to generate some output (text, HTML table, CSV whatever) and drop
it on this issue, we could all have a look and see what the scale of the
problem is.
I also had thought of collecting an email address for each feed and
including it in the config.ini file. We could add it to the issue template?
It's possible (but I haven't checked) that we will still have an issue
with technically correct feeds. But let's have a look at how many those
are first.
|
I checked the 2 examples you mentioned above and those are valid in RSS validator. So the validator script will not help with this issue. However I will write a script anyway to validate and check for required fields such as update_date |
After a trawl through the code, it really comes down to two things (I think):
The first is -- I think -- why we're only seeing one item for those older feeds which have shown up. The latter is why we're not excluding all the items as being too old. @rochacbruno if you were building a validator, I'd add a check that the entries have one of those date elements and, ideally, that the feed itself has an "updated" element. |
Problem
After we cleared the cache to address another issue, some very old posts showed up.
Details
A quick investigation suggests that the feeds / posts don't provide the date fields which the planet / feedparser software are looking for and the code falls back to some default value like today.
The text was updated successfully, but these errors were encountered: