实战:抓取太屋网房源数据
https://www.taiwu.com/ershoufang/
通过xpath定位可以看到房源数据都放在div标签中。路径为://div[@class="er-list"]/div
房源标题xpath路径为://div[@class="er-list"]/div/div/div[2]/div/a/text()
所以代码示例:
- # -*- encoding: utf-8 -*-
- """
- @File : 爬取58二手房.py
- @Time : 2022/3/20 17:31
- @Author : simon
- @Email : 294168604@qq.com
- @Software: PyCharm
- """
- import requests
- from lxml import etree
- #需求:爬取58二手房中的房源信息
- if __name__ == "__main__":
- headers = {
- 'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
- }
- #爬取到页面源码数据
- url = 'https://www.taiwu.com/ershoufang/'
- page_text = requests.get(url=url,headers=headers).text
-
- #数据解析
- tree = etree.HTML(page_text)
- #存储的就是li标签对象
- li_list = tree.xpath('//div[@class="er-list"]/div')
- fp = open('taiwu.txt','w',encoding='utf-8')
- for li in li_list:
- #局部解析
- title = li.xpath('./div/div[2]/div/a/text()')[0]
- print(title)
- fp.write(title+'\n')
-
-
效果展示: