一、简述
记--使用BeautifulSoup获取天气信息。
例子打包:外链:https://wwm.lanzouv.com/b0cb0vs2f密码:2gsf
二、效果
对比原来的网页:
三、源文件
GetWeather.py
- #!/usr/bin/env python3
- import requests
- from bs4 import BeautifulSoup
-
- # 获取网页文本内容
- def getHTMLText(url):
- headers = {
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
- }
- try:
- r = requests.get(url=url, headers=headers)
- r.raise_for_status()
- r.encoding=r.apparent_encoding
- return r.text
- except:
- print('fail')
- return ""
-
- # 获取天气信息
- def getWeather():
- url = 'https://www.tianqi.com/beijing/15'
- #print(url)
- html = getHTMLText(url)
- if (0 == len(html)):
- print('getHTMLText fail')
- return
- #print(html)
-
- # 解析网页内容
- soup = BeautifulSoup(html, 'html.parser')
-
- # 获取当前温度
- weather_info = soup.find('div', class_='weaone_ba').get_text()
- print(weather_info)
-
-
- # 调用
- getWeather()
四、总结
4.1 获取url:url = 'https://www.tianqi.com/beijing/15'
4.2获取天气信息所在元素标签:weather_info = soup.find('div', class_='weaone_ba').get_text()
4.3 更新浏览器代理 (非必需,个别网页对浏览器版本有要求)
- headers = {
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
- }
使用浏览器打开某一个网页,找到对应的http请求头即可
4.4 动态网页爬取可使用selenium