What if the load test tool itself can not handle load?

September 22nd, 2009

You use load test tools to break application software you are testing. What if the tool itself breaks and the applications under test do not even have a dent?

This is exactly what happened when using JMS protocol in HPPC ( Loadrunner).  There is a memory leak in HPPC when using JMS. The mmdrv.exe process invokes JVM (in-process) during use of one of the jms functions (jms_send_message_queue).

During one of our tests we started getting out of memory error. Of course if you increase the JVM heap ( using -Xmx ) the error will be delayed but the fact remains it is leaking memory. It was very embarrassing to tell the application team that our tool is breaking during load testing.

We have been trying desperately to get HP to admit and fix it but they are making us go around in circle. We  used JConsole to monitor the JVM and prove to them that there is a leak but they still do not get it. The only thing that is left is for us to now fix the code for them.

In order to use JConsole with loadrunner JMS, you need to add the following in the -Dcom.sun.management.jmxremote in the “Additional VM Parameters” in order for JConsole to connect to the JVM.  Once you do that you can see the JVM memory keep on climbing inspite of you manually invoking GC.

Are you listening HP?

 JMS Advanced Settings

Oracle buying Sun: What it means for Open Source Community

April 20th, 2009

Sun under the leadership of Jonathan Schwartz had been investing all its resources in open sourcing the whole organization. Sun was one of the biggest recent converts to Open Sourcing. When IBM was in talks to buy Sun the threat to open source community was not much. IBM had also invested heavily in open source and could have kept up the open source efforts of Sun alive and ticking.

Now that Oracle is in talks to buy Sun what is in store for open source community. What will happen to all the open source projects initiated by Sun? The most interesting of them for Oracle is MySQL. MySQL was getting positioned to provide stiff competition to Oracle in near future. What will be its fate under Larry Ellison’s leadership?

It would be actually in best interest of Oracle to expand the reach of MySQL. If MySQL gets killed there will be other open source database taking birth and it will be a big headache for Oracle to keep fending off these open source initiatives. Oracle is in a great position now that it is acquiring the leading open source database and it can really use this opportunity to be at the center of the Open Source action. Of course Java  brand  has tremendous value and Larry Ellison might find ways to monetize it, which Sun had miserably failed to do so. What happens to the rest of the open source initiatives? It is anybody’s guess. It will be interesting to see what happens to OpenSolaris, Glassfish, Netbeans etc.

The only thing that does not fit well in this acquisition is the hardware aspect of Sun. Oracle has always been a software company but with this acquisition it suddenly has to have a hardware strategy. One possible scenario is that Oracle will just absorb the software products of Sun and sell off the hardware business. It is difficult to envision Oracle putting itself into the center of an already overcrowded server market.

Is this a acquisition a big blow to Open Source Community? I doubt so. It might just be an hiccup and probably the open source community will keep chugging along and getting people on board. What ever happens it will surely be very interesting to see how this acquisition plays out.

One accurate measurement is worth a thousand expert opinions

April 6th, 2009

“One accurate measurement is worth a thousand expert opinions” - Adm Grace Murray Hopper ( Dec 9 1906 to Jan 1 1992)

Measurement is the most important aspect of Performance Engineering. There is no place for guess work in performance engineering. Measuring tools are the most important tools in a Performance Engineer’s tool set.

In one of the projects I was involved in, the second day of my job I was validating the solution for a high throughput. Everything was going fine till the performance test started throwing lot of errors. The solution involved around 15 servers. I had set up monitoring of resources on all the 15 servers. First thing I did was look up the resource utilization of all the servers. There was one server where the memory utilization kept on growing. I narrowed down to the process which was growing in memory. The graph of the memory utilization for the process showed a linear growth, it went up as high as 1GB and then the process terminated. This graph was proof enough to show the developer that the process had a big memory leak. This was fixed within a day and solution was ready for further testing.

Performance engineering requires lot of discipline and a methodical approach. Lot of times when we come across problems we tend to start giving expert opinions, start guessing where the problem might be or start looking at the code. One needs to take a scientific approach. One has to look at the facts, start with resource utilization and then narrow down to looking at logs, timestamps, database metrics, application server metrics, compare with historical benchmarks etc.

Check this out. The very first computer bug.

Scalability in Performance Engineering

March 22nd, 2009

There is a common misconception that just by adding more hardware  you can increase the throughput of your application. Yes indeed hardware is very cheap these days but things are not as easy as adding more hardware to improve the performance. You could have so many layers in your solution, web servers, application servers, database servers, authentication servers etc. Lets say currently your solution is supporting 50 transaction per second (tps)  and you have 20 servers spread across different layers. Now there is a need to support 100 tps.

Are you going to add 20 more servers since the load has doubled?

Are you sure that the application will scale up to 100 tps with added hardware?

Lets assume we have a perfectly scalable application. The first step is finding out which server is the bottleneck. Lets say the authentication server is at 90% CPU busy during peak load and is the only bottleneck in the system. Probably all you need is to add just one more authentication server and you could support 100 tps. You would be wasting time, money and resources by buying 20 more servers instead of just a single server which would have supported the 100 tps load. That is where Performance Engineering comes into picture.  Performance engineer is responsible for determining the scalability of the system and determine the bottlenecks both in the hardware resources and the software application.

There are two types of scalability, Vertical and Horizontal. Vertical scalability involves verifying that the software scales up on adding more resources ( cpu, memory, io, network) within a single server machine. Horizontal scalability involves verifying that the software scales up by adding more physical servers machines ( probably balanced via a load balancer ).  You determine the vertical scalability of the software by benchmarking on a particular hardware ( cpu, memory, io and network) for the maximum throughput and increasing the relevant resources and run another benchmark for the maximum throughput. If the software scales proportionately then it is scaling vertically.  Most of the times in real world you do not have time to test for vertical scalability and you just find out what is the maximum throughput on a single server and then see if it scales horizontally. Testing for horizontal scalability requires adding additional servers. Try to get three points in a graph to better understand the scalability. For e.g  on a 6 server architecture you could get benchmarking results on 1 server, 3 server and 6 servers and plot the throughput graph and understand if the software scales. If the software is not scaling proportionally or not scaling up at all then there is  probably a software bottleneck. If none of the hardware resources are found to be a bottleneck then we have a tough task at hand to determine the software bottleneck.

Read the rest of this entry »

Performance testing with data representative of production.

March 22nd, 2009

Lot of times applications are certified for good performance by testing with minimal data. The performance tests should be done with a data set which is representative of the production environment. One needs to load the peak expected data as per the data retention scheme employed in the performance test environment and validate the performance. The performance results can vary significantly for different size of the data incorporated in the database.

For example a particular query might be doing a full table scan and this issue might not be noticed when testing with small amounts of data. When it goes into production and there is substantial amount of data, this single query can bring down the performance of the whole solution to its knees. This single query could eat up the CPU of the database machine thereby adversely affecting all the other queries. The time to find out the root cause of this issue while the product is in production could be substantial and the customer impact could be huge. There might be need to optimize your database configuration ( SGA size, tablespace settings, redo log settings, archival settings etc.)  in order to support huge amounts of data. Some of your queries would be using complex joins and dealing with large number of records and could impact performance. All these issues could be found and resolved prior to going in production by running performance tests with production like data.

Read the rest of this entry »

Open Source Performance Testing

March 20th, 2009

Open Source has taken a center stage in the development world. It is a community effort which can not stopped. Either you ride it or get run over by it.

One area where open source has not yet made a big dent is in test automation and performance testing.
This area is still dominated by commercial tools. Most of the commercial tools available for automation and performance testing are extremely expensive. The leading tools from HP (QTP, Loadrunner) and Borland ( Silk Performer, Silk Test) come with a huge price tag.

I used to be a developer before I jumped into Performance Testing back in 1998. That time I was still not able to let go of my development skills. For the first Performance Testing assignment I had built a load testing tool using MS VC++ ( are we not talking of open source here?), the GUI for it is shown below. This was perfect for me. It suited all my needs for driving the load on the server. I had other scripts to automate monitoring, logging and analyzing the results ( see the tool image below).  Well right around then the company I used to work for, decided on standardizing the tool set and we settled on Silk Performer ( then a product of Segue). Using the tool was good for my resume value and it did have lots more bells and whistles.
Read the rest of this entry »

What is Response Time?

March 12th, 2009

One of the primary goals of Software Performance Engineering is to satisfy the response time as defined by Service Level Agreements (SLAs). Response time is one of the simplest concept yet it is not fully understood by many.

Let us take a real world example. You go to a restaurant and place an order for lunch. What is the total time to execute this order? Let’s look at the sequence of events. The waiter spends some time taking your order and then places it as the last item in a queue of orders. When this order reaches the top of the queue one of the cooks takes this order and cooks the dish and when it is ready the waiter brings it to your table. So the total time for your order is the sum of processing time ( time to take order + cooking time) and the wait time ( time the order was in the queue).

Response time can be defined as the total time taken to perform an action. This total time could include time processing the action ( in the app server, database, client etc.) and the time spent waiting ( network, IO, memory..). In Software Performance Engineering, Response Time can be surmised as sum of processing time and wait time.

Lets take an example from software engineering. In unix there is a utility “time” or “timex” for measuring the elapsed time for a particular process. We will be looking at a granular level, analyzing a particular process within a system and not the response time for a particular action through the whole solution. Let’s say your application process is “myapp”, you can run the time command on that application as below.

>time myapp
4.6 real 0.5 user 0.8 sys

Read the rest of this entry »

Using Sitemap plugin for Rails

March 6th, 2009

If you have a web site and you want search engined to index your site properly you need to have sitemap for your site. Sitemap is a way to describe your web site, revealing all the static and dynamic links within your website.

I have recently been building a web site based on ROR and I was looking for a plugin to accomplish this. The plugin I used is queso/sitemap from http://github.com/queso/sitemap/tree/master. This link has instructions and also the readme with the plugin gives enough information to install this plugin. The problem I faced was with setting up widgets and the static link in the sitemap_settings link.

Since I am new to Ruby on Rails it took me a while to figure out how to setup the widgets and static link in the sitemap_settings url.  To explain how I had setup the sitemap I am going to use  the model “Term” as an example. Basically I needed Sitemap to display the url of list of terms http://www.mydomain.com/terms and all the individual terms like http://www.mydomain.com/terms/term1, http://www.mydomain.com/terms/term2 etc.

In the Widgets setting screen I added a widget named Term and entered the information as shown in the image below. The model is going to be “Term”. The named route is “terms_path” which resolves to http://www.mydomain.com/terms. Correction: The named route is “terms_url” which resolves to http://www.mydomain.com/terms. Make sure to look up your routes to see if the name route you are providing is relevant and available ( use  >> rake routes command). If you leave the finder method blank, the sitemap plugin will use find(:all) method of the model to get the list of all items in the model. Sitemap plugin is going to generate an XML comprising of the url for list of terms ( http://www.mydomain.com/terms ) and the url of individual terms (http://www.mydomain.com/terms/term1, http://www.mydomain.com/terms/term2 , etc.).

You can create a custom finder method. We have created a find_sitemap class method in the Term model as shown in the code below:

 def self.find_sitemap(*args)
    find(:all,:conditions=>['state=?','published'])
  end

 This method is going to find all terms that are in published state. One issue I ran into was the url being generated by the sitemap for the list was coming out as “/terms” instead of “http://www.mydomain.com/terms”. I had to hack the sitemap code to get this right. I had to change the code in show.xml.builder of the sitemap plugin. The code block is shown below. The following was added in line 3 in the url_for method: “root_url.chop + “. I am not sure if this is the most graceful way but I was kind of in hurry. Pardon my ignorance. There was nothing wrong with the sitemap plugin it expects a named url ( contains the entire url) instead of the path ( contains relative path). The named route in the following image should be “terms_url” instead of “terms_path”.

Sitemap Widget

Sitemap Widget

Making Permalink_fu work

March 3rd, 2009

I recently came across an issue in trying to use Permalink_fu plugin for rails. The problem happened while trying to create permalinks for existing records. Lets say you are trying to add permalink facility to a model “Designers”. You already have a few records in the database and would like to generate the permalink for them. This can be done as following:

Designers.find(:all).each(&:save)

This would give an error: Iconv::InvalidEncoding: invalid encoding (”ascii//translit//IGNORE”, “utf-8″)

Turns out code needs to be changed in permalink_fu.rb at line 90.
The order is not correct. Commented out portion is what comes with the plugin. The subsequent statement is the correction.

# PermalinkFu.translation_to = ‘ascii//translit//IGNORE’
PermalinkFu.translation_to = ‘ascii//ignore//translit’

Please refer to this tutorial on using permalink_fu.

Also use this Reference

Using gdb to find memory leaks in HP Unix

March 1st, 2009

The following gdb commands are used to setup memory leak detection in C++ programs:
set heap-check leaks on
set heap-check free on
set heap-check bounds on
set heap-check scramble on

To show the leak the following command is used:

(gdb) info leaks

To view a particular leak from a list of leaks detected use the following:

(gdb) info leak  <leak number> ( leak number is the relevant number from the leak)

It is very important that program be linked with librt.sl shared library to use heap profiling.

The following example is using xscAppAdapter as a C++ program to demonstrate memory leak detection.

Read the rest of this entry »