Python爬虫:scrapy内置网页解析库parsel-通过css和xpath解析xml、html
文档
安装
- pip install parsel
-
代码示例
- from parsel import Selector
-
- selector = Selector(text="""<html>
- <body>
- <h1>Hello, Parsel!</h1>
- <ul>
- <li><a href="http://example.com">Link 1</a></li>
- <li><a href="http://scrapy.org">Link 2</a></li>
- </ul>
- </body>
- </html>""")
-
- selector.css('h1::text').get()
- 'Hello, Parsel!'
-
- selector.xpath('//h1/text()').re(r'\w+')
- ['Hello', 'Parsel']
-
- for li in selector.css('ul > li'):
- print(li.xpath('.//@href').get())
- http://example.com
- http://scrapy.org
-