Node.js Web 爬虫：Node Osmosis

Osmosis 是 Node.js 用来解析 HTML/XML 和 Web 内容爬取的扩展。

Features

Fast: uses libxml C bindings
Lightweight: no dependencies like jQuery, cheerio, or jsdom
Clean: promise based interface- no more nested callbacks
Flexible: supports both CSS and XPath selectors
Predictable: same input, same output, same order
Detailed logging for every step
Precise and natural IO flow- no setTimeout or process.nextTick
Easy debugging with built-in stack size and memory usage reporting
Memory leak free

Example: scrape all craigslist listings

var osmosis = require('osmosis');     osmosis  .get('www.craigslist.org/about/sites')   .find('h1 + div a')  .set('location')  .follow('@href')  .find('header + div + div li > a')  .set('category')  .follow('@href')  .find('p > a', '.totallink + a.button.next:first')  .follow('@href')  .set({      'title':        'section > h2',      'description':  '#postingbody',      'subcategory':  'div.breadbox > span[4]',      'date':         'time@datetime',      'latitude':     '#map@data-latitude',      'longitude':    '#map@data-longitude',      'images[]':     'img@src'  })  .data(function(listing) {      // do something with listing data  })

项目主页：http://www.open-open.com/lib/view/home/1428322356791

本文由用户 n6xb 自行上传分享，仅供网友学习交流。所有权归原作者，若您的权利被侵害，请联系管理员。

转载本站原创文章，请注明出处，并保留原始链接、图片水印。

本站是一个以用户分享为主的开源技术平台，欢迎各类分享！

本文地址：https://www.open-open.com/lib/view/open1428322356791.html

网络爬虫 Node Osmosis

热门搜索

Node.js Web 爬虫：Node Osmosis

Features

Example: scrape all craigslist listings