Python爬虫:scrapy内置网页解析库parsel-通过css和xpath解析xml、html
文档
安装
pip install parsel
代码示例
from parsel import Selector
selector = Selector(text="""<html>
<body>
<h1>Hello, Parsel!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
</body>
</html>""")
selector.css('h1::text').get()
'Hello, Parsel!'
selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']
for li in selector.css('ul > li'):
print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org