Python开源爬虫框架：Grab

Grab是一个Python开源Web爬虫框架。Grab提供非常多实用的方法来爬取网站和处理爬到的内容：

Automatic cookies (session) support
HTTP and SOCKS proxy with and without authorization
Keep-Alive support
IDN support
Tools to work with web forms
Easy multipart file uploading
Flexible customization of HTTP requests
Automatic charset detection
Powerful API of extracting info from HTML documents with XPATH queries
Asynchronous API to make thousands of simultaneous queries. This part of library called Spider and it is too big to even list its features in this README.
Python 3 ready

Grab Example

from grab import Grab  import logging    logging.basicConfig(level=logging.DEBUG)  g = Grab()  g.go('https://github.com/login')  g.set_input('login', '***')  g.set_input('password', '***')  g.submit()  g.doc.save('/tmp/x.html')    g.doc('//span[contains(@class, "octicon-sign-out")]').assert_exists()  home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()  repo_url = home_url + '?tab=repositories'    g.go(repo_url)  for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):      print('%s: %s' % (elem.text(),                        g.make_url_absolute(elem.attr('href'))))

项目主页：http://www.open-open.com/lib/view/home/1440858338263

本文由用户 jopen 自行上传分享，仅供网友学习交流。所有权归原作者，若您的权利被侵害，请联系管理员。

转载本站原创文章，请注明出处，并保留原始链接、图片水印。

本站是一个以用户分享为主的开源技术平台，欢迎各类分享！

本文地址：https://www.open-open.com/lib/view/open1440858338263.html

Grab 网络爬虫

热门搜索

Python开源爬虫框架：Grab

Grab Example