Archive for the ‘Perfromance Engineering’ Category

One accurate measurement is worth a thousand expert opinions

Monday, April 6th, 2009

“One accurate measurement is worth a thousand expert opinions” – Adm Grace Murray Hopper ( Dec 9 1906 to Jan 1 1992)

Measurement is the most important aspect of Performance Engineering. There is no place for guess work in performance engineering. Measuring tools are the most important tools in a Performance Engineer’s tool set.

In one of the projects I was involved in, the second day of my job I was validating the solution for a high throughput. Everything was going fine till the performance test started throwing lot of errors. The solution involved around 15 servers. I had set up monitoring of resources on all the 15 servers. First thing I did was look up the resource utilization of all the servers. There was one server where the memory utilization kept on growing. I narrowed down to the process which was growing in memory. The graph of the memory utilization for the process showed a linear growth, it went up as high as 1GB and then the process terminated. This graph was proof enough to show the developer that the process had a big memory leak. This was fixed within a day and solution was ready for further testing.

Performance engineering requires lot of discipline and a methodical approach. Lot of times when we come across problems we tend to start giving expert opinions, start guessing where the problem might be or start looking at the code. One needs to take a scientific approach. One has to look at the facts, start with resource utilization and then narrow down to looking at logs, timestamps, database metrics, application server metrics, compare with historical benchmarks etc.

Check this out. The very first computer bug.

Scalability in Performance Engineering

Sunday, March 22nd, 2009

There is a common misconception that just by adding more hardware  you can increase the throughput of your application. Yes indeed hardware is very cheap these days but things are not as easy as adding more hardware to improve the performance. You could have so many layers in your solution, web servers, application servers, database servers, authentication servers etc. Lets say currently your solution is supporting 50 transaction per second (tps)  and you have 20 servers spread across different layers. Now there is a need to support 100 tps.

Are you going to add 20 more servers since the load has doubled?

Are you sure that the application will scale up to 100 tps with added hardware?

Lets assume we have a perfectly scalable application. The first step is finding out which server is the bottleneck. Lets say the authentication server is at 90% CPU busy during peak load and is the only bottleneck in the system. Probably all you need is to add just one more authentication server and you could support 100 tps. You would be wasting time, money and resources by buying 20 more servers instead of just a single server which would have supported the 100 tps load. That is where Performance Engineering comes into picture.  Performance engineer is responsible for determining the scalability of the system and determine the bottlenecks both in the hardware resources and the software application.

There are two types of scalability, Vertical and Horizontal. Vertical scalability involves verifying that the software scales up on adding more resources ( cpu, memory, io, network) within a single server machine. Horizontal scalability involves verifying that the software scales up by adding more physical servers machines ( probably balanced via a load balancer ).  You determine the vertical scalability of the software by benchmarking on a particular hardware ( cpu, memory, io and network) for the maximum throughput and increasing the relevant resources and run another benchmark for the maximum throughput. If the software scales proportionately then it is scaling vertically.  Most of the times in real world you do not have time to test for vertical scalability and you just find out what is the maximum throughput on a single server and then see if it scales horizontally. Testing for horizontal scalability requires adding additional servers. Try to get three points in a graph to better understand the scalability. For e.g  on a 6 server architecture you could get benchmarking results on 1 server, 3 server and 6 servers and plot the throughput graph and understand if the software scales. If the software is not scaling proportionally or not scaling up at all then there is  probably a software bottleneck. If none of the hardware resources are found to be a bottleneck then we have a tough task at hand to determine the software bottleneck.


Performance testing with data representative of production.

Sunday, March 22nd, 2009

Lot of times applications are certified for good performance by testing with minimal data. The performance tests should be done with a data set which is representative of the production environment. One needs to load the peak expected data as per the data retention scheme employed in the performance test environment and validate the performance. The performance results can vary significantly for different size of the data incorporated in the database.

For example a particular query might be doing a full table scan and this issue might not be noticed when testing with small amounts of data. When it goes into production and there is substantial amount of data, this single query can bring down the performance of the whole solution to its knees. This single query could eat up the CPU of the database machine thereby adversely affecting all the other queries. The time to find out the root cause of this issue while the product is in production could be substantial and the customer impact could be huge. There might be need to optimize your database configuration ( SGA size, tablespace settings, redo log settings, archival settings etc.)  in order to support huge amounts of data. Some of your queries would be using complex joins and dealing with large number of records and could impact performance. All these issues could be found and resolved prior to going in production by running performance tests with production like data.


Open Source Performance Testing

Friday, March 20th, 2009

Open Source has taken a center stage in the development world. It is a community effort which can not stopped. Either you ride it or get run over by it.

One area where open source has not yet made a big dent is in test automation and performance testing.
This area is still dominated by commercial tools. Most of the commercial tools available for automation and performance testing are extremely expensive. The leading tools from HP (QTP, Loadrunner) and Borland ( Silk Performer, Silk Test) come with a huge price tag.

I used to be a developer before I jumped into Performance Testing back in 1998. That time I was still not able to let go of my development skills. For the first Performance Testing assignment I had built a load testing tool using MS VC++ ( are we not talking of open source here?), the GUI for it is shown below. This was perfect for me. It suited all my needs for driving the load on the server. I had other scripts to automate monitoring, logging and analyzing the results ( see the tool image below).  Well right around then the company I used to work for, decided on standardizing the tool set and we settled on Silk Performer ( then a product of Segue). Using the tool was good for my resume value and it did have lots more bells and whistles.

What is Response Time?

Thursday, March 12th, 2009

One of the primary goals of Software Performance Engineering is to satisfy the response time as defined by Service Level Agreements (SLAs). Response time is one of the simplest concept yet it is not fully understood by many.

Let us take a real world example. You go to a restaurant and place an order for lunch. What is the total time to execute this order? Let’s look at the sequence of events. The waiter spends some time taking your order and then places it as the last item in a queue of orders. When this order reaches the top of the queue one of the cooks takes this order and cooks the dish and when it is ready the waiter brings it to your table. So the total time for your order is the sum of processing time ( time to take order + cooking time) and the wait time ( time the order was in the queue).

Response time can be defined as the total time taken to perform an action. This total time could include time processing the action ( in the app server, database, client etc.) and the time spent waiting ( network, IO, memory..). In Software Performance Engineering, Response Time can be surmised as sum of processing time and wait time.

Lets take an example from software engineering. In unix there is a utility “time” or “timex” for measuring the elapsed time for a particular process. We will be looking at a granular level, analyzing a particular process within a system and not the response time for a particular action through the whole solution. Let’s say your application process is “myapp”, you can run the time command on that application as below.

>time myapp
4.6 real 0.5 user 0.8 sys


Using gdb to find memory leaks in HP Unix

Sunday, March 1st, 2009

The following gdb commands are used to setup memory leak detection in C++ programs:
set heap-check leaks on
set heap-check free on
set heap-check bounds on
set heap-check scramble on

To show the leak the following command is used:

(gdb) info leaks

To view a particular leak from a list of leaks detected use the following:

(gdb) info leak  <leak number> ( leak number is the relevant number from the leak)

It is very important that program be linked with shared library to use heap profiling.

The following example is using xscAppAdapter as a C++ program to demonstrate memory leak detection.


Handling Binary Data in Silk Performer

Thursday, February 12th, 2009

I have run into issues while performance testing Flex Applications where sometimes the AMF responses come back in binary format instead of more readable XML format.  Actually in one case the binary data was even base64 encoded and presented by silk performer ( in this case just disable the  option to transform the flex responses to xml format in your active profile).