| 注册
请输入搜索内容

热门搜索

Java Linux MySQL PHP JavaScript Hibernate jQuery Nginx
jopen
10年前发布

高效和分布式的通用数据处理平台:Apache Flink

Apache Flink 是高效和分布式的通用数据处理平台。

Apache Flink 声明式的数据分析开源系统,结合了分布式 MapReduce 类平台的高效,灵活的编程和扩展性。同时在并行数据库发现查询优化方案。

 DataSet<String> input = env.readTextFile(inputPath);    input.flatMap(new FlatMapFunction() {     public void flatMap(String value, Collector out) {         for (String s : value.split(" ")) {             out.collect(new Tuple2<String, Long>(s, 1L);         }     }  })  .groupBy(0)  .sum(1)  .writeAsText(outputPath);

System Stack

The Apache Flink stack consists of

  • Programming APIs for different languages (Java, Scala) and paradigms (record-oriented, graph-oriented).
  • A program optimizer that decides how to execute the program for good performance. It decides among other things about data movement and caching strategies.
  • A distributed runtime that executes programs in parallel distributed over many machines.

Flink runs independently from Hadoop, but integrates seamlessly with YARN (Hadoop's next-generation scheduler). Various file systems (including the Hadoop Distributed File System) can act as data sources.

高效和分布式的通用数据处理平台:Apache Flink

项目主页:http://www.open-open.com/lib/view/home/1409195666666

 本文由用户 jopen 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
 转载本站原创文章,请注明出处,并保留原始链接、图片水印。
 本站是一个以用户分享为主的开源技术平台,欢迎各类分享!
 本文地址:https://www.open-open.com/lib/view/open1409195666666.html
Apache Flink