Big Data in Real-Time

opennosql

贡献于2011-08-26

字数:11587 关键词: Scala SQL Go diff

Big Data in Real-Time at Twitter by Nick Kallen (@nk) 1Sunday, April 25, 2010 Follow along http://www.slideshare.net/nkallen/qcon 2Sunday, April 25, 2010 What is Real-Time Data? • On-line queries for a single web request • Off-line computations with very low latency • Latency and throughput are equally important • Not talking about Hadoop and other high-latency, Big Data tools 3Sunday, April 25, 2010 The four data problems • Tweets • Timelines • Social graphs • Search indices 4Sunday, April 25, 2010 5Sunday, April 25, 2010 What is a Tweet? • 140 character message, plus some metadata • Query patterns: • by id • by author • (also @replies, but not discussed here) • Row Storage 6Sunday, April 25, 2010 Find by primary key: 4376167936 7Sunday, April 25, 2010 Find all by user_id: 749863 8Sunday, April 25, 2010 Original Implementation • Relational • Single table, vertically scaled • Master-Slave replication and Memcached for read throughput. id user_id text created_at 20 12 just setting up my twttr 2006-03-21 20:50:14 29 12 inviting coworkers 2006-03-21 21:02:56 34 16 Oh shit, I just twittered a little. 2006-03-21 21:08:09 9Sunday, April 25, 2010 Original Implementation Master-Slave Replication Memcached for reads 10Sunday, April 25, 2010 Problems w/ solution • Disk space: did not want to support disk arrays larger than 800GB • At 2,954,291,678 tweets, disk was over 90% utilized. 11Sunday, April 25, 2010 PARTITION 12Sunday, April 25, 2010 Possible implementations id user_id 20 ... 22 ... 24 ... id user_id 21 ... 23 ... 25 ... Partition 1 Partition 2 Partition by primary key 13Sunday, April 25, 2010 Possible implementations id user_id 20 ... 22 ... 24 ... id user_id 21 ... 23 ... 25 ... Partition 1 Partition 2 Finding recent tweets by user_id queries N partitions Partition by primary key 13Sunday, April 25, 2010 Possible implementations id user_id ... 1 ... 1 ... 3 id user_id 21 2 23 2 25 2 Partition 1 Partition 2 Partition by user id 14Sunday, April 25, 2010 Possible implementations id user_id ... 1 ... 1 ... 3 id user_id 21 2 23 2 25 2 Partition 1 Partition 2 Partition by user id Finding a tweet by id queries N partitions 14Sunday, April 25, 2010 Current Implementation id user_id 24 ... 23 ... id user_id 22 ... 21 ... Partition 1 Partition 2 Partition by time 15Sunday, April 25, 2010 Current Implementation id user_id 24 ... 23 ... id user_id 22 ... 21 ... Partition 1 Partition 2 Queries try each partition in order until enough data is accumulated Partition by time 15Sunday, April 25, 2010 LOCALITY 16Sunday, April 25, 2010 Low Latency * Depends on the number of partitions searched PK Lookup Memcached MySQL 1ms <10ms* 17Sunday, April 25, 2010 Principles • Partition and index • Exploit locality (in this case, temporal locality) • New tweets are requested most frequently, so usually only 1 partition is checked 18Sunday, April 25, 2010 Problems w/ solution • Write throughput • Have encountered deadlocks in MySQL at crazy tweet velocity • Creating a new temporal shard is a manual process and takes too long; it involves setting up a parallel replication hierarchy. Our DBA hates us 19Sunday, April 25, 2010 Future solution • Cassandra (non-relational) • Primary Key partitioning • Manual secondary index on user_id • Memcached for 90+% of reads id user_id 20 ... 22 ... id user_id 21 ... 23 ... Partition k1 Partition k2 user_id ids 12 20, 21, ... 14 25, 32, ... user_id ids 13 48, 27, ... 15 23, 51, ... Partition u1 Partition u2 20Sunday, April 25, 2010 The four data problems • Tweets • Timelines • Social graphs • Search indices 21Sunday, April 25, 2010 22Sunday, April 25, 2010 What is a Timeline? • Sequence of tweet ids • Query pattern: get by user_id • Operations: • append • merge • truncate • High-velocity bounded vector • Space-based (in-place mutation) 23Sunday, April 25, 2010 Tweets from 3 different people 24Sunday, April 25, 2010 Original Implementation SELECT * FROM tweets WHERE user_id IN (SELECT source_id FROM followers WHERE destination_id = ?) ORDER BY created_at DESC LIMIT 20 25Sunday, April 25, 2010 Original Implementation SELECT * FROM tweets WHERE user_id IN (SELECT source_id FROM followers WHERE destination_id = ?) ORDER BY created_at DESC LIMIT 20 Crazy slow if you have lots of friends or indices can’t be kept in RAM 25Sunday, April 25, 2010 OFF-LINE VS. ONLINE COMPUTATION 26Sunday, April 25, 2010 Current Implementation • Sequences stored in Memcached • Fanout off-line, but has a low latency SLA • Truncate at random intervals to ensure bounded length • On cache miss, merge user timelines 27Sunday, April 25, 2010 Throughput Statistics date average tps peak tps fanout ratio deliveries 10/7/2008 30 120 175:1 21,000 4/15/2010 700 2,000 600:1 1,200,000 28Sunday, April 25, 2010 1.2mDeliveries per second 29Sunday, April 25, 2010 MEMORY HIERARCHY 30Sunday, April 25, 2010 Possible implementations • Fanout to disk • Ridonculous number of IOPS required, even with fancy buffering techniques • Cost of rebuilding data from other durable stores not too expensive • Fanout to memory • Good if cardinality of corpus * bytes/datum not too many GB 31Sunday, April 25, 2010 Low Latency get append fanout 1ms 1ms <1s* * Depends on the number of followers of the tweeter 32Sunday, April 25, 2010 Principles • Off-line vs. Online computation • The answer to some problems can be pre-computed if the amount of work is bounded and the query pattern is very limited • Keep the memory hierarchy in mind • The efficiency of a system includes the cost of generating data from another source (such as a backup) times the probability of needing to 33Sunday, April 25, 2010 The four data problems • Tweets • Timelines • Social graphs • Search indices 34Sunday, April 25, 2010 35Sunday, April 25, 2010 What is a Social Graph? • List of who follows whom, who blocks whom, etc. • Operations: • Enumerate by time • Intersection, Union, Difference • Inclusion • Cardinality • Mass-deletes for spam • Medium-velocity unbounded vectors • Complex, predetermined queries 36Sunday, April 25, 2010 37Sunday, April 25, 2010 Temporal enumeration 37Sunday, April 25, 2010 Temporal enumeration Inclusion 37Sunday, April 25, 2010 Temporal enumeration Inclusion Cardinality 37Sunday, April 25, 2010 38Sunday, April 25, 2010 Intersection: Deliver to people who follow both @aplusk and @foursquare 38Sunday, April 25, 2010 Original Implementation source_id destination_id 20 12 29 12 34 16 • Single table, vertically scaled • Master-Slave replication 39Sunday, April 25, 2010 Original Implementation source_id destination_id 20 12 29 12 34 16 Index • Single table, vertically scaled • Master-Slave replication 39Sunday, April 25, 2010 Original Implementation source_id destination_id 20 12 29 12 34 16 Index Index • Single table, vertically scaled • Master-Slave replication 39Sunday, April 25, 2010 Problems w/ solution • Write throughput • Indices couldn’t be kept in RAM 40Sunday, April 25, 2010 Current solution • Partitioned by user id • Edges stored in “forward” and “backward” directions • Indexed by time • Indexed by element (for set algebra) • Denormalized cardinality source_id destination_id updated_at x 20 12 20:50:14 x 20 13 20:51:32 20 16 destination_id source_id updated_at x 12 20 20:50:14 x 12 32 20:51:32 12 16 Forward Backward 41Sunday, April 25, 2010 Current solution • Partitioned by user id • Edges stored in “forward” and “backward” directions • Indexed by time • Indexed by element (for set algebra) • Denormalized cardinality source_id destination_id updated_at x 20 12 20:50:14 x 20 13 20:51:32 20 16 destination_id source_id updated_at x 12 20 20:50:14 x 12 32 20:51:32 12 16 Forward Backward Partitioned by user 41Sunday, April 25, 2010 Current solution • Partitioned by user id • Edges stored in “forward” and “backward” directions • Indexed by time • Indexed by element (for set algebra) • Denormalized cardinality source_id destination_id updated_at x 20 12 20:50:14 x 20 13 20:51:32 20 16 destination_id source_id updated_at x 12 20 20:50:14 x 12 32 20:51:32 12 16 Forward Backward Partitioned by user Edges stored in both directions 41Sunday, April 25, 2010 Challenges • Data consistency in the presence of failures • Write operations are idempotent: retry until success • Last-Write Wins for edges • (with an ordering relation on State for time conflicts) • Other commutative strategies for mass-writes 42Sunday, April 25, 2010 Low Latency cardinality iteration write ack write materialize inclusion 1ms 100edges/ms* 1ms 16ms 1ms * 2ms lower bound 43Sunday, April 25, 2010 Principles • It is not possible to pre-compute set algebra queries • Simple distributed coordination techniques work • Partition, replicate, index. Many efficiency and scalability problems are solved the same way 44Sunday, April 25, 2010 The four data problems • Tweets • Timelines • Social graphs • Search indices 45Sunday, April 25, 2010 46Sunday, April 25, 2010 What is a Search Index? • “Find me all tweets with these words in it...” • Posting list • Boolean and/or queries • Complex, ad hoc queries • Relevance is recency* * Note: there is a non-real-time component to search, but it is not discussed here 47Sunday, April 25, 2010 Intersection of three posting lists 48Sunday, April 25, 2010 Original Implementation term_id doc_id 20 12 20 86 34 16 • Single table, vertically scaled • Master-Slave replication for read throughput 49Sunday, April 25, 2010 Problems w/ solution • Index could not be kept in memory 50Sunday, April 25, 2010 Current Implementation term_id doc_id 24 ... 23 ... term_id doc_id 22 ... 21 ... Partition 1 Partition 2 • Partitioned by time • Uses MySQL • Uses delayed key-write 51Sunday, April 25, 2010 Problems w/ solution • Write throughput • Queries for rare terms need to search many partitions • Space efficiency/recall • MySQL requires lots of memory 52Sunday, April 25, 2010 DATA NEAR COMPUTATION 53Sunday, April 25, 2010 Future solution • Document partitioning • Time partitioning too • Merge layer • May use Lucene instead of MySQL 54Sunday, April 25, 2010 Principles • Partition so that work can be parallelized • Temporal locality is not always enough 55Sunday, April 25, 2010 The four data problems • Tweets • Timelines • Social graphs • Search indices 56Sunday, April 25, 2010 Summary Statistics reads/second writes/ second cardinality bytes/item durability Tweets 100k 850 12b 300b durable Timelines 80k 1.2m a lot 3.2k non Graphs 100k 20k 13b 110 durable Search 13k 21k† 315m‡ 1k durable † tps * 25 postings ‡ 75 partitions * 4.2m tweets 57Sunday, April 25, 2010 Principles • All engineering solutions are transient • Nothing’s perfect but some solutions are good enough for a while • Scalability solutions aren’t magic. They involve partitioning, indexing, and replication • All data for real-time queries MUST be in memory. Disk is for writes only. • Some problems can be solved with pre-computation, but a lot can’t • Exploit locality where possible 58Sunday, April 25, 2010 Appendix 59Sunday, April 25, 2010

下载文档,方便阅读与编辑

文档的实际排版效果,会与网站的显示效果略有不同!!

需要 20 金币 [ 分享文档获得金币 ] 0 人已下载

下载文档

相关文档