response.replace(body=response.text.replace(‘\xa0‘,‘‘))，scrapy抓取网页含\r \t \n \xa0时，修改response方法

时间：12-07来源：作者：点击数：17

xpath抓取数据值有\r\n\t时去掉的方法：https://www.cdsy.xyz/computer/programme/Python/241207/cd64830.html

抓取网页含\r \t \n时,用normalize-space出现特殊符号有时候并不会成功，例如：['商家 \xa0厦门有限公司']，'\xa0'在网页源码中是' '，可以用如下方法：

方法一：修改response这种方法是修改网页代码里面的数据，'\xa0'在网页源码中是' ',个人觉得毕竟不是筛选后的数据，修改时间会比较长

def parse(self,response):
    # 修改网页代码里面的数据
    response = response.replace(body=response.text.replace('&nbsp;',''))
    order_company = response.xpath('normalize-space(//*[@id="to"]/tbody/tr/td[3]/a/text())').extract()
    item['order_company'] = order_company[0].strip()

方法二：在选择出需要的item数据传递时候直接替换

item['order_company'] = order_company[0].replace("\xa0", "").strip()

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:xpath抓取数据值有\r\n\t时，去掉的方法normalize-space（）下一篇:urllib中urlparse使用技巧以及iter_content图片边下边存到硬盘使用

InDesign入门教程\|半	Vmware虚拟机三种网络
Git的安装与卸载详细	Photoshop非常实用的

首页

学习

工作

生活

兴趣组

电子

计算机

掌上机件

图库

游戏

考试与竞赛

黑板报

国学

外语

下载

故事汇

社区

课程

response.replace(body=response.text.replace(‘\xa0‘,‘‘))，scrapy抓取网页含\r \t \n \xa0时，修改response方法