| 注册
请输入搜索内容

热门搜索

Java Linux MySQL PHP JavaScript Hibernate jQuery Nginx
jopen
10年前发布

PHP 爬虫库:Goutte

Goutte 是一个抓取网站数据的 PHP 库。它提供了一个优雅的 API,这使得从远程页面上选择特定元素变得简单。

Require the Goutte phar file to use Goutte in a script:

require_once '/path/to/goutte.phar'; 

Create a Goutte Client instance (which extends SymfonyComponentBrowserKitClient):

use Goutte\Client; $client = new Client(); 

Make requests with the request() method:

$crawler = $client->request('GET', 'http://www.symfony-project.org/'); 

The method returns a Crawler object (SymfonyComponentDomCrawlerCrawler).

点击链接:

$link = $crawler->selectLink('Plugins')->link(); $crawler = $client->click($link); 

提交表单:

$form = $crawler->selectButton('sign in')->form();   $crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx')); 
抽取数据:
$nodes = $crawler->filter('.error_list'); if ($nodes->count()) {     die(sprintf("Authentication error: %s\n", $nodes->text())); }   printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text()); 

项目主页:http://www.open-open.com/lib/view/home/1388458699125

 本文由用户 jopen 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
 转载本站原创文章,请注明出处,并保留原始链接、图片水印。
 本站是一个以用户分享为主的开源技术平台,欢迎各类分享!
 本文地址:https://www.open-open.com/lib/view/open1388458699125.html
Goutte 网络爬虫