urllib_get请求豆瓣电影前十页

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

#https://movie.douban.com/j/chart/top_list?type=17&interval_id=100%3A90&action=&start=0&limit=20
#https://movie.douban.com/j/chart/top_list?type=17&interval_id=100%3A90&action=&start=60&limit=20
#start=(page-1)*20

#下载豆瓣电影前十页的数据
import urllib_handler的基本使用.request
import json
def download_page(url):
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
req = urllib.request.Request(url=url,headers=headers)
response = urllib.request.urlopen(req)
return response.read().decode('utf-8')
def parse_page(html,i):
with open('douban_'+str(i)+'.json', 'a', encoding='utf-8') as f:
f.write(html)
def main():
for i in range(10):
url = 'https://movie.douban.com/j/chart/top_list?type=17&interval_id=100%3A90&action=&start='+str(i*20)+'&limit=20'
html = download_page(url)
parse_page(html,i)
if __name__ == '__main__':
main()



urllib_get请求豆瓣电影前十页
https://ianwusb.blog/2024/07/26/urllib_get请求豆瓣电影前十页/
作者
Ianwusb
发布于
2024年7月26日
许可协议