Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

www.cocomanhua.com无法下载,请修复下 #306

Open
2837linlinlin opened this issue Apr 29, 2021 · 4 comments
Open

www.cocomanhua.com无法下载,请修复下 #306

2837linlinlin opened this issue Apr 29, 2021 · 4 comments

Comments

@2837linlinlin
Copy link

Start analyzing https://www.cocomanhua.com/15554/
Analyzing success!
Start downloading 最强的魔导士,膝盖中了一箭之后成为乡下的卫兵
total 12 episode.
Downloading ep 第1话 最强的魔导士隐居起来
Traceback (most recent call last):
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 338, in error_loop
process()
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 289, in download
crawler.init()
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 51, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 58, in init_images
self.get_images()
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 188, in get_images
self.ep.current_url
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\mods\oh.py", line 115, in get_images
imgs = eval(code)
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\node_vm2_init_.py", line 28, in eval
return vm.run(code)
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\node_vm2_init_.py", line 131, in run
return self.communicate({"action": "run", "code": code})
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\node_vm2_init_.py", line 101, in communicate
raise VMError(data["error"])
node_vm2.VMError: setInterval is not defined
Traceback (most recent call last):
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 338, in error_loop
process()
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 289, in download
crawler.init()
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 51, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 58, in init_images
self.get_images()
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\crawler.py", line 188, in get_images
self.ep.current_url
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\comiccrawler\mods\oh.py", line 115, in get_images
imgs = eval(code)
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\node_vm2_init_.py", line 28, in eval
return vm.run(code)
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\node_vm2_init_.py", line 131, in run
return self.communicate({"action": "run", "code": code})
File "C:\Users\gao\PycharmProjects\mh_jiaoben\venv\lib\site-packages\node_vm2_init_.py", line 101, in communicate
raise VMError(data["error"])
node_vm2.VMError: setInterval is not defined

@2837linlinlin
Copy link
Author

node_vm2.VMError: setInterval is not defined 是不是改规则了

@eight04
Copy link
Owner

eight04 commented Aug 18, 2021

我現在會拿到 403 錯誤(2021/8/18),看起來是被 Cloudflare 擋掉了。

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.cocomanhua.com/15554/

@eight04
Copy link
Owner

eight04 commented Aug 18, 2021

setInterval 錯誤的參考解法︰#294 (comment)

@rickchen16
Copy link

rickchen16 commented Aug 28, 2021

沒試出requests避開Cloudflare的方法

使用BeautifulSoup+selenium可以讓python訪問cocomanga
取得畫面目錄頁面的html, ex:https://www.cocomanhua.com/15335/
但因為html有變, oh.py的get_episodes要改一下
但get_images還是會在imgs = eval(code)出錯
即使成功組出image的url, 直接access還是會被擋掉, 還不知道怎樣才能直接access圖片url
ex:
https://img.cocomanga.com/comic/15335/**RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0**=/0001.jpg
https://img.cocomanga.com/comic/15335/RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0=/0003.jpg
https://img.cocomanga.com/comic/15335/RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0=/0005.jpg
RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0
這看起來是動態產生的

取得"https://www.cocomanhua.com/15335/" html content

from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.Chrome('chromedriver')
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser')
html_content = str(soup)
browser.close()

oh.py的get_episode修改

for match in re.finditer(r'href="([^"]+)" title="([^"]+)', html):
ep_url, title = match.groups()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants