| 注册
请输入搜索内容

热门搜索

Java Linux MySQL PHP JavaScript Hibernate jQuery Nginx
jopen
12年前发布

DotNetWikiBot Framework

DotNetWikiBot Framework 是一个全功能的客户端API和一个控制台应用,用来构建抓取基于 MediaWiki 网站的爬虫,采用 .NET 开发。

using DotNetWikiBot; // Reference DotNetWikiBot namespace for easy access    class MyBot : Bot // Derive your bot class from framework's Bot class  {      public static void Main()      {          // Firstly make Site object, specifying site's URL and your bot account          Site enWiki = new Site("http://en.wikipedia.org", "myBotLogin", "myPassword");          // Then make Page object, specifying site and page title in constructor          Page p = new Page(enWiki, "Art");          // Load actual page text from live wiki          p.Load();          // Add "Visual arts" category link to "Art" page's text          p.AddToCategory("Visual arts");          // Save "Art" article's text back to live wiki with specified comment          p.Save("comment: category link added", true);            // Make empty PageList object, representing collection of pages          PageList pl = new PageList(enWiki);          // Fill it with 100 pages, where "nuclear disintegration" is mentioned          pl.FillFromGoogleSearchResults("nuclear disintegration", 100);          // Load texts and metadata of all found pages from live wiki          pl.LoadEx();          // Now suppose, that we must correct some typical mistake in all our pages          foreach (Page i in pl)              // In each page we will replace one phrase with another              i.text = i.text.Replace("fusion products", "fission products");          // Finally we'll save all changed pages to wiki with 5 seconds interval             pl.SaveSmoothly(5, "comment: mistake autocorrection", true);              // Now clear our PageList so we could re-use it          pl.Clear();          // Fill it with all articles in "Astronomy" category and it's subcategories          pl.FillFromCategoryTree("Astronomy");          // Download and save all PageList's articles to specified local XML file          pl.SaveXMLDumpToFile("Dumps\\ArticlesAboutAstronomy.xml");        }  }

项目主页:http://www.open-open.com/lib/view/home/1349946678556

 本文由用户 jopen 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
 转载本站原创文章,请注明出处,并保留原始链接、图片水印。
 本站是一个以用户分享为主的开源技术平台,欢迎各类分享!
 本文地址:https://www.open-open.com/lib/view/open1349946678556.html
爬虫 网络爬虫