web123456

Python crawler encountered 403 error

During this period, I really want to go to the P station to climb. I tried it before, but it all failed. Once I tried it with phantomJS, but it ended up being stuckLogin page, too complicated. Once I was stuck on the encoding and couldn't get the page. This time I got stuck on 403. I got the link to the picture. When I was about to download it, the 403 error popped up. The picture couldn't be opened in the browser. I thought the website blocked my IP, so I gave up. Later, a great god said that sometimes I need to send some requests to the web page to eat before I can download things (this is the first time I heard thaturllib. Can also add request headers) It's really a lot of insight. After my own exploration, I found that I needed to send a "Referer" parameter to the server to download high-definition large images on the P site to tell the server which website you entered the image link from. Moreover, the picture page of the p site does not have a separate json file loading, and its json data is all present in the html file. It's really a ghost (I thought that code was garbled). The P site is still awesome, it took me about two days to understand these.

Code:

import 
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
url = "/img-original/img/2016/12/25/05/10/36/60541651_p0.jpg"

opener = .build_opener()
=[('Referer', "/member_illust.php?mode=medium&illust_id=60541651")]
.install_opener(opener)
(url,"E://")