Thursday, May 3, 2012

How to measure the Performance of a Server?

I have repeated following too many times in last few years and decide to write this up. If I have missed something, please add a comment.

Understanding Server Performance

Characteristic Performance Graph's of a Server

Above graphs capture the characteristic behavior of a server. As shown by the graph, server performance is gauged by measuring latency and throughput against latency.
  • Latency measures the end-to-end time processing time. In a messaging environment, teams determine latency by measuring the time between sending a request and receiving the response. Latency is measured from the client machine and includes the network overhead as well.
  • Throughput measures the amount of messages that a server processes during a specific time interval (e.g. per second). Throughput is calculated by measuring the time taken to processes a set of messages and then using the following equation.

Throughput = number of completed requests / time to complete the requests

It is worth noting that these two values are often loosely related. However, a we cannot directly derive one measurement from the other.

As shown by the figure, a server has an initial range where throughput increases at a roughly linear rate and latency either remains constant or linear. As concurrency increases, the approximately linear relationship decays, and system performance rapidly degrades. Performance tuning attempts to modify the relationship between concurrency and throughput and/or latency, and maintain a linear relationship as long as possible.

For more details about latency and throughput, read the following online resources:
  1. Understanding Latency versus Throughput
  2. Latency vs Throughput
Unlike static server capacity measurements (e.g. CPU processing speed, memory size), performance is a dynamic measurement. Latency and throughput are strongly influenced by concurrency and work unit size. Larger work unit size usually negatively influence latency and throughput. Concurrency is the number of aggregate work units (e.g. message, business process, transformation, or rule) processed in parallel (e.g. per second). Higher concurrency values have a tendency to increase latency (wait time) and decrease throughput (units processed).

To visualize server performance across the range of possible workloads, we draw a graph of latency or throughput against concurrency or work unit size as shown by the above graph.

Doing the Performance Test

Your goal of running a performance test is to draw a graph like above. To do that you have to run the performance test multiple times with different concurrency and for each test measure latency and throughput.

Following are some of the common steps and a checklist.

Workload and client Setup

  1. Each concurrent client simulates a different user. Each run in a separate thread, and run a series of operations (requests) against the server.
  2. First step is finding a workload. If there is a well-known benchmark for the particular server you are using, use that. If not, create a benchmark by simulating the real user operations as closely as possible.
  3. Messages generated by the test must not be identical. Otherwise, caching might come to play and provide too optimistic results. Best method is to capture and replay a real workload. If that is not possible, generate a randomized workload. Use a known data set whenever it makes sense.
  4. We measure latency and throughput from the client. For each test run we need to measure following.
  5. End to end time taken by each operation. Latency is the AVERAGE of all end-to-end latencies.
  6. For each client, we collect the test-started time, test-end time, and the number of completed messages. Throughout is the SUM of throughput measured at each client.
  7. To measure the time, if you are in Java, you can use System.nanoTime() or System. currentTimeInMillis () and with other programing languages you should use equivalent methods.
  8. For each test run, it is best to take readings for about 10,000 messages. For example, with concurrency 10, each client should send at least 1000 messages. Even if there are many clients, each client should at least send 200 messages.
  9. They are many tools that can do the performance test. Examples are JMeter, LoadUI, javabench, ab. Use them when applicable.

Experimental Setup

  1. You may need to tune the server for best perforce with settings like enough Heap memory, open file limits etc.
  2. Do not run both client and the server on the same machine (they interfere with each other and results and affected)
  3. You need at least 1GB network to avoid the interference of the network.
  4. Generally, you should not run more than 200 clients from the same machine. For some cases, you might need multiple machines to run the client.
  5. You have to note down and report the environment (Memory, CPU, number of cores, operating system of each machines) with the results. It is a good practice to measure the CPU usage and memory while test is running. You can use JConsole (if it is Java) and if you are in a linux machine run “watch cat /proc/loadavg” command to track load average. CPU usage is a very unreliable matrix as it changes very fast. However, load average is a very reliable matrix.

Running the test

  1. Make sure you restart the server between each two test runs
  2. When you start the server, first send it few hundred requests before starting the real test to warm up the server.
  3. Automate as much as possible. Ideally running one command should run the test, collect results, verify the results, and print summery/ graphs.
  4. Make sure nothing else is running in the machines at the same time.
  5. After test run has finished, check the logs and results to make sure operations were really successful.

Verifying your results

  1. You must catch and print the errors both at the server and the client. If there are too many errors, you results might be useless. Also it is a good idea to verify the results at the client side.
  2. Often performance tests are the first time you stress your system, and more often than not you will run into errors. Leave time to fix them.
  3. You might want to use a profiler to detect any obvious performance bottlenecks before you actually run the tests. (Java Profilers: JProfiler, YourKit)

Analyze your results and write it up

  1. Data cleanup (Optional) – it is a common practice to remove the outliers. General method is to either remove anything that is more than 3 X stddev different from the mean or remove 1% of furthest data from mean.
  2. Draw the graphs. Make sure you have a title, captions for both X and Y-axis with Units, and legend if you have more than one dataset in the same graph. (You can use Excel, OpenOffice, or GNU Plot). Rule of thumb is that reader needs to be able to understand the graph without reading the text).
  3. Optionally, draw 90% or 95% confidence intervals (Error bars)
  4. Try to interpret what results mean
    • Generalize
    • Understand the trends and explain them
    • Look for any odd results and explain them
    • Make sure to have a conclusion
Hope this was useful. If you enjoyed this post you might also like Mastering the 4 Balancing Acts in Microservices Architecture and Distributed Caching Woes: Cache Invalidation



7 comments:

Srinath Perera said...

Another useful link http://vanillajava.blogspot.com/2012/04/what-is-latency-throughput-and-degree.html

Srinath Perera said...

Also chapter 22nd of Unix systems programming(http://usp.cs.utsa.edu/usp/contents.html). It covers in depth details of server performance measuring and also how to report them.

Srinath Perera said...

If you need a comprehensive treatment .. read R. Jain, "The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling," Wiley- Interscience, New York, NY, April 1991, ISBN:0471503361., http://www.cse.wustl.edu/~jain/books/perfbook.htm

Srinath Perera said...

One another point is to make sure following two

1) Each thread sends at least about 1000 messages
2) You have sent at least about 3 times messages as the final throughput value (else throughput will not be accurate)

Srinath Perera said...
This comment has been removed by the author.
Srinath Perera said...

few thoughts from Paul on ESB performance, http://pzf.fremantle.org/2012/09/understanding-esb-performance.html

Srinath Perera said...

Thinking Clearly about Performance by Cary Millsap

This is a very good article about the $subject, http://queue.acm.org/detail.cfm?id=1854041