一个可配置的,可扩展的PHP网页蜘蛛:PHP-Spider
PHP-Spider是一个可配置的,可扩展的PHP网页蜘蛛。
PHP-Spider Features
- supports two traversal algorithms: breadth-first and depth-first
 - supports depth limiting and queue size limiting
 - supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
 - comes with a useful set of URI filters, such as Domain limiting
 - supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
 - supports custom request handling logic
 - comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
 - supports custom persistence handlers
 - collects statistics about the crawl for reporting
 - dispatches useful events, allowing developers to add even more custom behavior
 - supports a politeness policy
 - will soon come with many default discoverers: RSS, Atom, RDF, etc.
 - will soon support multiple queueing mechanisms (file, memcache, redis)
 - will eventually support distributed spidering with a central queue
 
 本文由用户 jopen  自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
                 转载本站原创文章,请注明出处,并保留原始链接、图片水印。
                 本站是一个以用户分享为主的开源技术平台,欢迎各类分享!