今天天气不大好,我就是看看天气,我就发现这个网站数据不错,今天就给他全干下来!!!!!!链接:http://www.pm25.com/
数据就在相应源码中,我们就将这个页面响应代码,用lxml解析,将源代码转化为etree树,分别使用xpath提取链接对每一个链接进行请求,然后再对详情页响应解析,例如:北京天气详情页http://www.pm25.com/beijing.html我们大致思路就是这样,最后把数据保存为csv文件,xpath获取数据的时候有些是空值,会报错,所以我们就全部try了,代码如下:
- # -- coding: utf-8 --
- # @Time : 2021/1/23 3:27
- # @FileName: Pm2.5.py
- # @Software: PyCharm
-
- import requests
- from lxml import etree
- import csv
-
-
- class Weather():
- # 初始化
- def __init__(self):
- # url
- self.url = 'http://www.pm25.com/'
- self.headers = {
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
- "Chrome/87.0.4280.88 Safari/537.36 "
- }
-
- # 发送请求
- def get_data(self):
- response = requests.get(url=self.url, headers=self.headers)
- return response
-
- # 解析
- def parse_data(self, response):
- html = etree.HTML(response.content)
- link_list = html.xpath('//*[@id="scrollbar1"]/div[3]/div/div[3]/div/dl/dd/a/@href')
- for link in link_list:
- link = 'http://www.pm25.com' + link
- # 解析子页数据
- self.url = link
- response = self.get_data()
- html = etree.HTML(response.content)
- try:
- city_name = html.xpath("/html/body/div[6]/div/div[1]/h2/text()")[0]
- except:
- pass
- try:
- qua = html.xpath("/html/body/div[6]/div/div[3]/div[1]/p/span[1]/text()")[0]
- except:
- pass
- try:
- aqi_num = html.xpath("/html/body/div[6]/div/div[3]/div[1]/a/text()")[0]
- except:
- pass
- try:
- pm = html.xpath("/html/body/div[6]/div/div[3]/div[2]/p[1]/span/text()")[0] + '微克/立方米'
- except:
- pass
- try:
- wea = html.xpath("/html/body/div[6]/div/div[4]/div/p/span/text()")[0]
- temp = html.xpath("/html/body/div[6]/div/div[4]/div/p/text()")[1]
- add_weather = wea + temp
- except:
- pass
-
- data = "城市名称:" + city_name + ", " + "空气质量:" + qua + ", " + "AQI指数:" + aqi_num + ", " + "PM2.5浓度:" + pm + ', ' + "天气:" + add_weather
- print(data)
- # 这里直接单写也不返回重新定义保存函数
- # 写入csv
- csv_writer.writerow([city_name, qua, aqi_num, pm, add_weather])
-
- # 调用
- def run(self):
- response = self.get_data()
- self.parse_data(response)
-
-
- if __name__ == '__main__':
- # 保证只运行一次,如果不保证一次话就会
- with open('info.csv', 'a', newline='') as f:
- csv_writer = csv.writer(f)
- csv_writer.writerow(["城市名称", "空气质量", 'AQI指数', "PM2.5", "天气"])
- weather = Weather()
- weather.run()
-
为了方便我就没定义保存函数,效果如下:
我们保存的csv文件如下:
有喜欢的请多多点赞!!!!!