By Vadim Tkachenko, Muhammad Irfan and Peter Zaitsev MySQL Query Optimization MySQL query optimization should not be based on guesses, but exact measurements. A proper measurement of these units is the first important task in MySQL performance optimization. This Percona eBook explains how to do this with Percona Cloud Tools. We'll also examine the best way to analyze WordPress MySQL queries with "Query Analytics. We'll also share other tools and tips for analyzing MySQL's slow query log; and finally we'll compare PERFORMANCE_SCHEMA vs Slow Query Log. 3 5 12 Chapter 1: Don't guess! Measure with Percona Cloud Tools Analyzing WordPress Chapter 2: MySQL queries with Query Analytics Chapter 3: Tools and tips for analysis of MySQL's Slow Query Log Chapter 4: PERFORMANCE_SCHEMA vs Slow Query Log 16 MySQL Query Optimization Table of Contents Percona was founded in August 2006 by Peter Zaitsev and Vadim Tkachenko and now employs a global network of experts with a staff of more than 100 people. Our customer list is large and diverse, including Fortune 50 firms, popular websites, and small startups. We have over 1,800 customers and, although we do not reveal all of their names, chances are we're working with every large MySQL user you've heard about. To put Percona's MySQL expertise to work for you, please contact us. About Percona Skype: oncall.percona GTalk: email@example.com AIM (AOL Instant Messenger): oncallpercona Telephone direct-to-engineer: +1-877-862-4316 or UK Toll Free: +44-800-088-5561 Telephone to live operator: +1-888-488-8556 Customer portal: https://customers.percona.com/ Is this an emergency? Get immediate assistance from Percona Support 24/7. Click here Copyright © 2006-2014 Percona LLC MySQL Query Optimization Chapter 1: Don't guess! Measure with Percona Cloud Tools Chapter 1: Don't guess! Measure with Percona Cloud Tools In our practice we often see that MySQL performance optimization is done in a sort of “black magic” way. A common opinion is: “If there is a performance problem – then it must be a database issue, because what else could it be? And if this is a database issue, then it must involve IO problems because the reason for a slow database is always a slow IO…” Following this logic might actually give a result, but achieving a fully successful resolution would require magic. At Percona we use a different approach. Performance optimization should not be based on guesses, but exact measurements. In application to databases, I described previously, we use queries as units of work. And a proper measurement of these units is the first important task in performance optimization. Let me list the metrics of what our Percona Cloud Tools provides: Query count – How many times query was executed Query_time – Total time that MySQL spent on query execution Lock_time – Time spent in waiting on Table level locks Rows_sent – How many rows query returned to application Rows_examined – How many rows MySQL actually had to read In Chapter 2 I'll explain that Rows_sent/Rows_examined ratio is one of interest for OLTP workloads. And the following metrics are available only for Percona Server, and not because we crippled our tools, but because MySQL simply does not provide them. It is worth reminding that one of main goals in making Percona Server was to provide diagnostics and transparency. Percona Server metrics: Rows_affected Merge_passes InnoDB_IO_r_bytes InnoDB_IO_r_ops InnoDB_IO_r_wait InnoDB_pages_distinct InnoDB_queue_wait InnoDB_rec_lock_wait Query_length . By Vadim Tkachenko MySQL Query Optimization Chapter 1: Don't guess! Measure with Percona Cloud Tools Bytes_sent Tmp_tables Tmp_disk_tables Tmp_table_sizes Meaning of all these metrics are available from our docs page Now, often it is not enough to provide a single value for a given metric, so the following stats are available: Total, Average, Minimum, Median, 95%, Maximum and Stddev. Add for each of these trending graphs and you will see a matrix on your query: Having all of these comprehensive metrics about your queries, you can now make an intelligent decision on how to approach performance optimization of your database. You can do it all by yourself today with the free Beta of Percona Cloud Tools. I understand that all of this information might be too much too figure out at first, but as with any tool – it takes skills and some experience to analyze and apply metrics properly. To help to bootstrap with Percona Cloud Tools, I will be running a webinar “Analyze MySQL Query Performance with Percona Cloud Tools” on Feb-12, 2014; if you register and install the Percona Cloud Tools agent 4 you may win a free consulting hour from me during which I will examined your queries and provide an advice for optimization. See all conditions there. MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics Chapter 2: Analyzing WordPress MySQL queries with Query Analytics Our blog, the MySQLPerformanceBlog.com, is powered by WordPress, but we never really looked into what kind of queries to MySQL are used by WordPress. So for couple months we ran a Query Analytics (part of Percona Cloud Tools) agent there, and now it is interesting to take a look on queries. Query Analytics uses reports produced by pt-query-digest, but it is quite different as it allows to see trends and dynamics of particular query, in contrast to pt-query-digest, which is just one static report. Why looking into queries important? I gave an intro in my previous post from this series. So Query Analytics give the report on the top queries. How to detect which query is “bad”? One of metrics I am typically looking into is ratio of “Rows examined” to “Rows sent”. In OLTP workload I expect “Rows sent” to be close to “Rows examined”, because otherwise it means that a query handles a lot of rows (“examined”) which are not used in final result set (“sent”), and it means wasted CPU cycles and even unnecessary IOs if rows are not in memory. Looking on WordPress queries it does not take long to find one: This one actually looks quite bad… It examines up to ~186000 rows to return 0 or in the best case 1 row. The full query text is (and this is available in Query Analytics, you do not need to dig through logs to find it): 5 By Vadim Tkachenko MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '154' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'poloralphlauren.redhillrecords' OR comment_author_email = 'firstname.lastname@example.org' ) AND comment_content = 'Probabilities are in case you are like the ma spam jorityof people nowadays, you\'re f lululemonaddictoutletcanadaSale.angelasauceda ighting tooth and nail just looking to keep up together with your existence. Acquiring organized can help you win the fight. Appear to the ugg factors in just abo spam ut every of your spaces (desk, workplace, living room, bed' LIMIT 1; We can see how execution time of this query changes overtime... ... and also how many rows it examines for the last month 6 MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics It is clearly an upward trend, and obviously the query does not scale well as there more and more data. I find these trending graphs very useful and they are available in Query Analytics as we continuously digest and analyze queries. We can see that only for the last month amount of rows this query examines increased from ~130K to ~180K. So, the obvious question is how to optimize this query? We look into the explain plan 7 Here's a closer look... and SHOW CREATE TABLE MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics Obviously WordPress did not design this schema to handle 180000 comments to a single post. There are several ways to fix it, I will take the easiest way and change the key KEY `comment_post_ID` (`comment_post_ID`) to KEY `comment_post_ID` (`comment_post_ID`,`comment_content`(300)) and it changes execution plan to 8 From 186000 rows to 910 rows – that’s quite improvement! How does it affect execution time? Let’s query run for a while and see again in our trending graph: The drop from ~600ms to ~34ms and for Rows examined: --> See next page MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics The 2nd query is also not to hard to find, and it is again on wp_comments table The query examines up to 16K rows, sending only 123 in the best case. Query text is (this one is from different instance of WordPress, so the table structure is different) SELECT comment_post_ID FROM wp_comments WHERE LCASE(comment_author_email) = 'email@example.com' AND comment_subscribe='Y' AND comment_approved = '1' GROUP BY comment_post_ID and EXPLAIN for this particular one 9 --> See next page MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics This table structure is 10 ... a closer look at that... MySQL Query Optimization Chapter 2: Analyzing WordPress MySQL queries with Query Analytics .n conclusion: Query Analytics from Percona Cloud Tools gives immediate insight as to which query requires attention With continuously running reports we can see the trends and effects from our optimizations Please also note, that Query Analytics does not require you to install MySQL proxy, some third-party middle-ware or any kind of tcp-traffic analyzers. It fully operates with slow-log generated by MySQL, Percona Server or MariaDB (Percona Server provides much more information in slow- log than vanilla MySQL). So try Percona Cloud Tools for free while it’s still in beta. You’ll be up and running minutes! For more information, you can view a recorded webinar from a few months ago titled: “Analyze MySQL Query Performance with Percona Cloud Tools” that is free and my slides are also available there, too. 11 MySQL Query Optimization Chapter 3: Tools and tips for analysis of MySQL's Slow Query Log Chapter 3: Tools and tips for analysis of MySQL's Slow Query Log Query optimization is essential for good database server performance and usually DBAs need to ensure the top performance possible for all queries. In MySQL, the desirable way is to generate a query log for all running queries within a specific time period and then run a query analysis tool to identify the bad queries. Percona Toolkit’s pt-query-digest is one of the most powerful tools for SQL analysis. That’s because pt-query-digest can generate a very comprehensive report that spots problematic queries very efficiently. It works equally well with Oracle MySQL server. This post will focus mainly on pt-query-digest. Slow query log is great at spotting really slow queries that are good candidates for optimization. Beginning with MySQL 5.1.21, the minimum value is 0 for long_query_time, and the value can be specified to a resolution of microseconds. In Percona Server additional statistics may be output to the slow query log. You can find the full details here. For our clients, we often need to identify queries that impact an application the most. It does not always have to be the slowest queries –queries that runs more frequently with lower execution time per call put more load on a server than queries running with lower frequency. We of course want to get rid of really slow queries but to really optimize application throughput, we also need to investigate queries that generate most of the load. Further, if you enable option log_queries_not_using_indexes then MySQL will log queries doing full table scans which doesn’t always reflect that the query is slow, because in some situations the query optimizer chooses full table scan rather than using any available index or probably showing all records from a small table. Our usual recommendation is to generate the slow log with long_query_time=0. This will record all the traffic but this will be I/O intensive and will eat up disk space very quickly depending on your workload. So beware of running with long_query_time=0 for only a specific period of time and revert it back to logging only very slow queries. In Percona Server there is nice option where you can limit the rate of logging, log_slow_rate_limit is the option to handle it. Filtering slow query log is very helpful too in some cases e.g. if we know the main performance issue is table scans we can log queries only doing full table scans or if we see I/O is bottleneck we can collect queries doing full scans and queries creating on disk temporary tables. 12 MySQL has a nice feature, slow query log, which allows you to log all queries that exceed a predefined about of time to execute. Query optimization is essential for good database server performance and usually DBAs need to ensure the top performance possible for all queries. In MySQL, the desirable way is to generate a query log for all running queries within a specific time period and then run a query analysis tool to identify the bad queries. By Muhammad Irfan MySQL Query Optimization Chapter 3: Tools and tips for analysis of MySQL's Slow Query Log Again, this is only possible in Percona Server with the log_slow_filter option. Also, you may want to collect everything on slow query log and then filter with pt-query-digest. Depending on I/O capacity, you might prefer one or another way, as collecting everything in slow query log allows us to investigate other queries too if needed. Finally, use pt-query-digest to generate an aggregate report over slow query log which highlights the problematic part very efficiently. Again, pt-query-digest can bring up server load high so our usual recommendation on it is to move slow query log to some staging/dev server and run pt-query-digest over there to generate the report. Note: changing the long_query_time parameter value only affects newly created connections to log queries exceeds long_query_time threshold. In Percona Server there is feature which changes variable scope to global instead of local. Enabling slow_query_log_use_global_control log queries for connected sessions too after changing long_query_time parameter threshold. You can read more about this patch here. Next, let's examine some of the other aspects of pt-query-digest tool. Let me show you code snippets that enable slow query log for only a specific time period with long_query_time=0 and log_slow_verbosity to ‘full’. log_slow_verbosity is a Percona Server variable which logs extra stats such as information on query cache, Filesort, temporary tables, InnoDB statistics etc. Once you are done collecting logs, revert back the values for long_query_time to the previous value, and finally run pt-query-digest on the log to generate report. Note: run the below code in same MySQL session. 13 See next page for larger view MySQL Query Optimization Chapter 3: Tools and tips for analysis of MySQL's Slow Query Log My colleague Bill Karwin wrote bash script that does almost the same as the above code. You can find the script to collect slow logs here. This script doesn’t hold connection to the database session while you wait for logs to accumulate and it sets all the variables back to the state they were before. For full documentation view this. 14 Further, you can also get explain output into the report from the pt-query-digest tool. For that you need to use –explain parameter similar to as follows: MySQL Query Optimization Chapter 3: Tools and tips for analysis of MySQL's Slow Query Log Explain output in query report will get you all the information for query execution plan and explain output signal towards how that particular query going to be executed. Note that, if you execute pt-query-digest over slow query log other than originated server of slow query log as I mentioned above e.g. staging/dev you may get different execution path for the query in the report or lower number of rows to examined, etc., because usually staging/dev servers has different data distribution, different MySQL versions, or different indexes. MySQL explain adds overhead as queries needs to be prepared on the server to generate intended query execution path. For this reason, you may want to run pt-query-digest with – explain on a production replica. It’s worth mentioning that logging queries with log_slow_verbosity in Percona Server is really handy as it shows lots of additional statistics and it is more helpful in situations when the explain plan reports a different execution path than when the query is executed. On that particular topic, you may want to check this nice post. pt-query-digest also supports filters. You can read more about it here. Let me show you an example. The following command will discard everything apart from insert/update/delete queries in pt-query-digest output report. If you’re looking for some GUI tools for pt-query-digest then I would recommend reading this nice blogpost from my colleague Roman. Further, our CEO Peter Zaitsev also wrote a post recently where he shows the comparison between performance_schema and slow query log. Check here for details. In related news, Percona recently announced Percona Cloud Tools, the next generation of tools for MySQL. It runs a client-side agent (pt-agent) which runs pt-query-digest on the server with some intervals and uploads the aggregated data to the Percona Cloud Tools API which process it further. Query Analytics is one tool from the Percona Cloud Tools that provides advanced query metrics. It is a nice visualization tool. You may be interested to learn more about it here, and it’s also worth viewing this related webinar about Percona Cloud Tools from our CTO Vadim Tkachenko. 15 Conclusion: pt-query-digest from Percona Toolkit is a versatile (and free) tool for slow query log analysis. It.provides good insight about every individual query, especially in Percona Server with log_slow_verbosity enabled, e.g. log queries with microsecond precision, log information about the query’s execution plan. On top of that, Percona Cloud Tools includes Query Analytics which provides you with good visuals about query performance and also provides a view of historical data. Chapter 4: PERFORMANCE_SCHEMA vs Slow Query Log Back in January, shortly after Vadim wrote about Percona Cloud Tools and using Slow Query Log to capture the data, Mark Leith asked why don’t we just use Performance Schema instead? This is an interesting question and I think it deserves its own blog post to talk about. First, I would say main reason for using Slow Query Log is compatibility. Basic Slow query log with microsecond query time precision is available starting in MySQL 5.1, while events_statements_summary_by_digest table was only added in MySQL 5.6 which was out for about a year now but which is still far from complete market domination. It is especially interesting if you look at the low-end market – users who just run some web applications using whatever MySQL Version their hosting provider installed for them. If you look at WordPress Users for example you will see MySQL 5.6 at just 1.3% as of today. As time passes and MySQL 5.6 takes a larger share of the market we surely should add support for Performance Schema based query sampling to Percona Cloud Tools. The second reason is amount of data available. There is a fair amount of data which Performance Schema digest table providers including some which are not available in Percona Server logs: 16 By Peter Zaitsev MySQL Query Optimization Chapter 4: PERFORMANCE_SCHEMA vs Slow Query Log BTW – note Rows Sent here not being equal to rows examined while in reality they should be exactly the same for this benchmark. This is the approximate accounting of Performance Schema in action, though. Now compare it to the sample for the same query in the slow query log in Percona Server: What I believe is the most valuable here is the information about Innodb IO which instantly allows us to isolate the query patterns which are disk IO bound as well as information about Bytes Sent which allows to see which queries are responsible for generating high volumes of network traffic. 17 I wish Performance Schema would be enhanced to return data in something like JSON where for each digest the top waits are accounted as in reality it can be different by the query. Some queries might be waiting on IO other on Locks, yet another could be bound by some specific mutexes. Having exact information about what limits performance of queries of the specific type would be a gem. The third reason for using Slow Query Log is using placeholders. Note in the query above has “SELECT c FROM sbtest WHERE id = ?” which is not very convenient – I can’t even run EXPLAIN for such query to see what could be the reason for its slowness. Log contains exact queries and we are able to show exact queries in reports (pt-query-digest and Percona Cloud Tools) or you can opt for seeing only query digests if you do not want to see the values for privacy/security reasons. Picking the constant for a query with worse plan usually works very well to check out worse case scenario. This might look like very simple problem – why you can’t just come up with ID and reconstruct the query but for more complicated queries with multiple conditions it is virtually impossible to reconstruct the realistic query. Now in theory you can look up actual query from events_statements_history_long and join the data together, however it does not really work at the high query rates as it is very likely rare queries will not have a sample available in the history table. MySQL Query Optimization Chapter 4: PERFORMANCE_SCHEMA vs Slow Query Log The forth reason is support for prepared statements. Enable prepared statements and you will not see the actual query in the digest. This may or may not be an issue for your application but it further limits usability of this feature. I can’t count on simply looking at events_statements_summary_by_digest to always find which queries are responsible for majority of the load. The fifth reason is performance or actually not much of a reason. I really think Performance Schema overhead is reasonable for most workloads. In simple queries benchmark which I’ve done: On my old server I got some 20.2K QPS with Performance Schema Disabled and 19.4 QPS with Performance Schema enabled which is overhead of less than 5%. For most workloads paying 5% to have insight about what is happening with the system is a very fair trade. 18 The slow query log overhead actually can be much larger. The moderate level of details “microtime” resulted in 15.1K queries and log_slow_verbosity=”full” takes this value down to 11.6K having over 40% overhead. Note I designed this test as worse case scenario and for more complicated queries the overhead is likely to be less (while overhead with Performance Schema can stay the same or even increase depending on what queries are doing). Some people set long_query_time to some non zero value to reduce amount of queries logged. This is bad idea because the workload logged will be very different from your real one – chances are majority of your load comes from simple quick queries which will be very unrepresented with non zero long query time with only outliers logged. A much better idea is to enable Sampling which is available in latest version of Percona Server –this way only one out of every so many queries will be logged: MySQL Query Optimization Chapter 4: PERFORMANCE_SCHEMA vs Slow Query Log This will get one out ever 100 queries randomly logged which should give you good idea of your workload without such skew. It works well unless you have some rare and very complicated queries which impact your workload disproportionally and which you can’t ignore for performance analyses. To deal with this situation we added slow_query_log_always_write_time option to Percona Server, which allows you to always log such queries in the log even if they would not be selected because of sampling. Enabling sampling 1/100 queries for this workload with full level of details I get 19.8K queries giving us overhead less than 2% which is even less than Performance Schema and selecting 1/1000 queries to be logged I can get overhead to about 1%. So with Slow Query Log I can make a choice between accuracy and overhead. I wish Performance Schema would offer something similar – instead of figuring out what probes and statistic tables I need (not all of them are enabled by default) I could just chose to get the sampled data and play with accuracy vs overhead instead of missing the data all together. Summary: There is a great amount of data in Performance Schema in MySQL 5.6 and Percona Server 5.6 though there are a number of reasons that you also might not want to discard the old and proven tools based on the slow query log just yet. 19 MySQL Query Optimization Chapter 4: PERFORMANCE_SCHEMA vs Slow Query Log Visit the Percona library for more free MySQL eBook selections http://www.percona.com/resources/mysql-ebooks Powered by TCPDF (www.tcpdf.org) 21 About the authors Copyright © 2006-2014 Percona LLC Vadim Tkachenko co-founded Percona in 2006 after four years in the High Performance Group within the official MySQL Support Team. He serves on Percona's executive team as CTO. He leads Percona's development group, which produces Percona Server, Percona XtraDB, and Percona XtraBackup. He is an expert in LAMP performance, especially optimizing MySQL and InnoDB internals to take full advantage of modern hardware using his multi-threaded programming background. Source code patches authored by Vadim have been incorporated by Oracle Corporation and its predecessors into the mainstream MySQL and InnoDB products. He now lives in California with his wife and their two children. Peter Zaitsev, Percona's CEO and founder, is arguably the world's foremost expert in MySQL performance and scaling, with a special expertise in hardware and database internals. Peter's work has contributed to dozens of MySQL appliances, storage engines, replication systems, and other technologies. Peter co-authored High Performance MySQL along with two other Percona experts. He is a frequently invited guest at open source conferences, and has been a sell-out speaker at the yearly MySQL User Conference since its inception. Peter currently lives in North Carolina with his wife and their two children. Muhammad Irfan is vastly experienced in the LAMP Stack. Prior to joining Percona, he worked in the role of MySQL DBA & LAMP Administrator, maintained high traffic websites, and worked as a Consultant. His professional interests focus on MySQL scalability and on performance optimization. In his spare time, he normally spends time with family and friends and loves to play and watch cricket. He currently lives in Karachi, Pakistan, with his wife and son.