一、简述
记--使用BeautifulSoup获取天气信息。
例子打包:外链:https://wwm.lanzouv.com/b0cb0vs2f密码:2gsf
二、效果
对比原来的网页:
三、源文件
GetWeather.py
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
# 获取网页文本内容
def getHTMLText(url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
}
try:
r = requests.get(url=url, headers=headers)
r.raise_for_status()
r.encoding=r.apparent_encoding
return r.text
except:
print('fail')
return ""
# 获取天气信息
def getWeather():
url = 'https://www.tianqi.com/beijing/15'
#print(url)
html = getHTMLText(url)
if (0 == len(html)):
print('getHTMLText fail')
return
#print(html)
# 解析网页内容
soup = BeautifulSoup(html, 'html.parser')
# 获取当前温度
weather_info = soup.find('div', class_='weaone_ba').get_text()
print(weather_info)
# 调用
getWeather()
四、总结
4.1 获取url:url = 'https://www.tianqi.com/beijing/15'
4.2获取天气信息所在元素标签:weather_info = soup.find('div', class_='weaone_ba').get_text()
4.3 更新浏览器代理 (非必需,个别网页对浏览器版本有要求)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
}
使用浏览器打开某一个网页,找到对应的http请求头即可
4.4 动态网页爬取可使用selenium