[Python]爬取首都之窗百姓信件网址id python 2020.2.13
经人提醒忘记发网址id的爬取过程了,
http://www.beijing.gov.cn/hudong/hdjl/com.web.consult.consultDetail.flow?originalId=AH20021300174
AH20021300174为要爬取的内容
现代码如下:
import json
import requests
import io
url="http://www.beijing.gov.cn/hudong/hdjl/com.web.search.mailList.mailList.biz.ext"
kv = {
‘Host‘: ‘www.beijing.gov.cn‘,
‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0‘,
‘Accept‘: ‘application/json, text/javascript, */*; q=0.01‘,
‘Accept-Language‘: ‘zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2‘,
‘Accept-Encoding‘: ‘gzip, deflate‘,
‘Content-Type‘: ‘text/json‘,
‘X-Requested-With‘: ‘XMLHttpRequest‘,
‘Content-Length‘: ‘155‘,
‘Origin‘: ‘http://www.beijing.gov.cn‘,
‘Connection‘: ‘keep-alive‘,
‘Referer‘: ‘http://www.beijing.gov.cn/hudong/hdjl/‘}
def page(begin):
query={
‘PageCond/begin‘: begin,
‘PageCond/isCount‘:‘true‘,
‘PageCond/length‘:6,
}
datas=json.dumps(query)
r=requests.post(url,data=datas,headers=kv)
print(r.status_code)
print(r.text)
js=json.loads(r.text)
for j in js["mailList"]:
print(j)
print(j.get("original_id"))
def href():
begin=0
for i in range(0,5584):
if i%6==0:
page(i)
#print(begin)
if __name__=="__main__":
href() 相关推荐
风吹夏天 2020-04-19
chenfei0 2020-02-26
shangs00 2020-02-24
Happyunlimited 2020-02-21
tangjianft 2020-02-19
misszc 2013-06-12
CYJ0go 2019-12-25
chenjiazhu 2019-12-17
liqing 2019-12-13
akcsdno 2019-12-13
明月清风精进不止 2019-12-01
DAV数据库 2019-11-06
dazhi00 2019-11-03
ScienceExplorer 2019-08-22
tycoon 2017-11-02
huha 2012-09-05
Idreamlife 2011-08-05
雪糕 2010-12-19
wtbapi 2012-10-22