druid之旅-大数据实时分析数据存储框架

c6g3

贡献于2015-02-17

字数:0 关键词: 分布式/云计算/大数据

THE JOURNEY OF DRUID, A BIG DATA ANALYTICS DATA STORE ERIC TSCHETTER, CREATOR OF DRUID DEMO “ ” REQUIREMENTS • Data Ingestion Rate • Ingest data and make it queryable in real-time • Arbitrary Drill-Downs, Slice-n-Dice • Arbitrary boolean filters • Availability • Downtime is evil REQUIREMENTS “ ” WHAT WE TRIED I. RDBMS - Relational Database WHAT WE TRIED • Star Schema • Aggregate Tables • Query Caching I. RDBMS - THE SETUP • Queries that were cached • fast • Queries against aggregate tables • fast to acceptable • Queries against base fact table • generally unacceptable I. RDBMS - THE RESULTS I. RDBMS - PERFORMANCE Select COUNT(*) scan rate ~5.5M rows / second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~5 seconds Page load with 20 queries over a week of data long time I. RDBMS - Relational Database WHAT WE TRIED I. RDBMS - Relational Database II. NoSQL - Key/Value Store WHAT WE TRIED • Pre-aggregate all dimensional combinations (truncate time) • Store results in a NoSQL store II. NOSQL - THE SETUP ts! gender age revenue 1 M 18 $0.15 1 F 25 $1.03 2 F 18 $0.01 Key Value 1 revenue=$1.19 1,M revenue=$0.15 1,F revenue=$1.04 1,18 revenue=$0.16 1,25 revenue=$1.03 1,M,18 revenue=$0.15 1,F,18 revenue=$0.01 1,F,25 revenue=$1.03 • Queries were fast • range scan on primary key • Inflexible • not aggregated, not available • Not continuously updated • aggregate first, then display • Processing scales exponentially II. NOSQL - THE RESULTS • Dimensional combinations => exponential increase • Tried limiting dimensional depth • still expands exponentially • Example: ~500k records • 11 dimensions, 5-deep • 4.5 hours on a 15-node Hadoop cluster • 14 dimensions, 5-deep • 9 hours on a 25-node Hadoop cluster II. NOSQL - PERFORMANCE I. RDBMS - Relational Database II. NoSQL - Key/Value Store WHAT WE TRIED I. RDBMS - Relational Database II. NoSQL - Key/Value Store III. ??? WHAT WE TRIED • Problem with RDBMS: scans are slow • Problem with NoSQL: computationally intractable WHAT WE LEARNED • Problem with RDBMS: scans are slow • Problem with NoSQL: computationally intractable
 ! • Tackling RDBMS issue seems easier WHAT WE LEARNED “ ” INTRODUCING DRUID 1.Real-Time Ingestion (Indigestion?) 2.Slicing-n-Dicing Drill Down Fruit Ninjas 3.Available DRUID – KEY FEATURES ARCHITECTURE ARCHITECTURE ARCHITECTURE Realtime Nodes Query API ARCHITECTURE Query API a Historical Nodes Realtime Nodes Query API Hand Off Data ARCHITECTURE Query API a Historical Nodes Broker Nodes Realtime Nodes Query API Query API Query Rewrite Scatter/Gather Hand Off Data DATA! timestamp publisher advertiser gender country ... click price! 2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65! 2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62! 2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45! 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87! 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99! 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53! ... COLUMN COMPRESSION - DICTIONARIES • Create ids • bieberfever.com -> 0, ultratrimfast.com -> 1 • Store • publisher -> [0, 0, 0, 1, 1, 1] • advertiser -> [0, 0, 0, 0, 0, 0] timestamp publisher advertiser gender country ... click price! 2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65! 2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62! 2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45! 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87! 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99! 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53! ... BITMAP INDEXES timestamp publisher advertiser gender country ... click price! 2011-01-01T00:01:35Z bieberfever.com google.com Male USA 0 0.65! 2011-01-01T00:03:63Z bieberfever.com google.com Male USA 0 0.62! 2011-01-01T00:04:51Z bieberfever.com google.com Male USA 1 0.45! 2011-01-01T01:00:00Z ultratrimfast.com google.com Female UK 0 0.87! 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 0 0.99! 2011-01-01T02:00:00Z ultratrimfast.com google.com Female UK 1 1.53! ... • bieberfever.com -> [0, 1, 2] -> [111000] • ultratrimfast.com -> [3, 4, 5] -> [000111] • Compress • CONCISE (http://ricerca.mat.uniroma3.it/users/colanton/concise.html) FAST AND FLEXIBLE QUERIES JUSTIN BIEBER! [1, 1, 0, 0] KE$HA! [0, 0, 1, 1] JUSTIN BIEBER! OR KE$HA! [1, 1, 1, 1] Rows POETS 0 JUSTIN(BIEBER 1 JUSTIN(BIEBER 2 KE$HA 3 KE$HA AVAILABILITY • Fault-tolerant • Rolling deployments/restarts • 3 years, no downtime for update • Grow == start processes • Shrink == kill processes •Extensibility model allows for significant customizability • Deep Storage • Service Discovery • Extra queries • Extra aggregations • Extra column types/storage formats EXTENSIBLE •Scan speed • ~53M rows / second / core •Realtime ingestion rate • ~20k events / second / node on “real” data • Highest benchmark so far: 250k/second on toy data •http://druid.io/blog/2014/03/17/benchmarking-druid.html DRUID BENCHMARKS •Metamarkets cluster • ~10 trillion events (impressions, bids, etc.) • >200 TB • 175 machines • 90% query latency < 1s, 95% < 2s • 300k/s event ingestion sustained DRUID BENCHMARKS “ ” DRUID IS OPEN SOURCE •Interactive dashboards of ! •KPIs • Page Views, Impressions, Uniques, Revenue •Server Metrics • Request latency, etc •Network Metrics • Packets, bytes, etc USE CASES •If you ever say the following, investigate Druid •“I wish I could slice and dice into this data” •“I just did X, I wish I could see if it was working in real-time” •“I wish I could see this data with more fine-grained granularity” •“I wish I didn’t have to use pre-canned drill downs” USE CASES URL: http://druid.io Language: Java License: GPLv2 ! Used by ~10 companies in production Contributions from 40 people DRUID IS OPEN

下载文档,方便阅读与编辑

文档的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享文档获得金币 ] 2 人已下载

下载文档

相关文档