| 注册
请输入搜索内容

热门搜索

Java Linux MySQL PHP JavaScript Hibernate jQuery Nginx
jopen
10年前发布

一个可配置的,可扩展的PHP网页蜘蛛:PHP-Spider

PHP-Spider是一个可配置的,可扩展的PHP网页蜘蛛。

PHP-Spider Features

  • supports two traversal algorithms: breadth-first and depth-first
  • supports depth limiting and queue size limiting
  • supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
  • comes with a useful set of URI filters, such as Domain limiting
  • supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
  • supports custom request handling logic
  • comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
  • supports custom persistence handlers
  • collects statistics about the crawl for reporting
  • dispatches useful events, allowing developers to add even more custom behavior
  • supports a politeness policy
  • will soon come with many default discoverers: RSS, Atom, RDF, etc.
  • will soon support multiple queueing mechanisms (file, memcache, redis)
  • will eventually support distributed spidering with a central queue

项目主页:http://www.open-open.com/lib/view/home/1399025018796

 本文由用户 jopen 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
 转载本站原创文章,请注明出处,并保留原始链接、图片水印。
 本站是一个以用户分享为主的开源技术平台,欢迎各类分享!
 本文地址:https://www.open-open.com/lib/view/open1399025018796.html
网络爬虫 PHP-Spider