| 注册
请输入搜索内容

热门搜索

Java Linux MySQL PHP JavaScript Hibernate jQuery Nginx
jopen
11年前发布

Hadoopy: 使用Cython实现Python对Hadoop的封装

Hadoopy是Hadoop Streaming的一个Python封装,采用Cython开发。它简单,快速,并且易于被修改。它已经在超过700个节点的集群中测试过了。Hadoopy的目标是:

  • Similar interface as the Hadoop API (design patterns usable between Python/Java interfaces)
  • General compatibility with dumbo to allow users to switch back and forth
  • Usable on Hadoop clusters without Python or admin access
  • Fast conversion and processing
  • Stay small and well documented
  • Be transparent with what is going on
  • Handle programs with complicated .so’s, ctypes, and extensions
  • Code written for hack-ability
  • Simple HDFS access (e.g., reading, writing, ls)
  • Support (and not replicate) the greater Hadoop ecosystem (e.g., Oozie, whirr)

杀手特点(Hadoopy的独特之处):

  • Automated job parallelization ‘auto-oozie’ available in the hadoopy flow project (maintained out of branch)
  • Local execution of unmodified MapReduce job with launch_local
  • Read/write sequence files of TypedBytes directly to HDFS from python (readtb, writetb)
  • Allows printing to stdout and stderr in Hadoop tasks without causing problems (uses the ‘pipe hopping’ technique, both are available in the task’s stderr)
  • Works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)

额外特性:

  • Works on OS X
  • Critical path is in Cython
  • Simple HDFS access (readtb and ls) inside Python, even inside running jobs
  • Unit test interface
  • Reporting using status and counters (and print statements! no need to be scared of them in Hadoopy)
  • Supports design patterns in the Lin&Dyer book
  • Typedbytes support (very fast)
  • Oozie support

项目主页:http://www.open-open.com/lib/view/home/1357887387624

     本文由用户 jopen 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
     转载本站原创文章,请注明出处,并保留原始链接、图片水印。
     本站是一个以用户分享为主的开源技术平台,欢迎各类分享!
     本文地址:https://www.open-open.com/lib/view/open1357887387624.html
    Hadoop 分布式/云计算/大数据