Archive for March 22nd, 2009

Scalability in Performance Engineering

Sunday, March 22nd, 2009

There is a common misconception that just by adding more hardware  you can increase the throughput of your application. Yes indeed hardware is very cheap these days but things are not as easy as adding more hardware to improve the performance. You could have so many layers in your solution, web servers, application servers, database servers, authentication servers etc. Lets say currently your solution is supporting 50 transaction per second (tps)  and you have 20 servers spread across different layers. Now there is a need to support 100 tps.

Are you going to add 20 more servers since the load has doubled?

Are you sure that the application will scale up to 100 tps with added hardware?

Lets assume we have a perfectly scalable application. The first step is finding out which server is the bottleneck. Lets say the authentication server is at 90% CPU busy during peak load and is the only bottleneck in the system. Probably all you need is to add just one more authentication server and you could support 100 tps. You would be wasting time, money and resources by buying 20 more servers instead of just a single server which would have supported the 100 tps load. That is where Performance Engineering comes into picture.  Performance engineer is responsible for determining the scalability of the system and determine the bottlenecks both in the hardware resources and the software application.

There are two types of scalability, Vertical and Horizontal. Vertical scalability involves verifying that the software scales up on adding more resources ( cpu, memory, io, network) within a single server machine. Horizontal scalability involves verifying that the software scales up by adding more physical servers machines ( probably balanced via a load balancer ).  You determine the vertical scalability of the software by benchmarking on a particular hardware ( cpu, memory, io and network) for the maximum throughput and increasing the relevant resources and run another benchmark for the maximum throughput. If the software scales proportionately then it is scaling vertically.  Most of the times in real world you do not have time to test for vertical scalability and you just find out what is the maximum throughput on a single server and then see if it scales horizontally. Testing for horizontal scalability requires adding additional servers. Try to get three points in a graph to better understand the scalability. For e.g  on a 6 server architecture you could get benchmarking results on 1 server, 3 server and 6 servers and plot the throughput graph and understand if the software scales. If the software is not scaling proportionally or not scaling up at all then there is  probably a software bottleneck. If none of the hardware resources are found to be a bottleneck then we have a tough task at hand to determine the software bottleneck.

(more…)

Performance testing with data representative of production.

Sunday, March 22nd, 2009

Lot of times applications are certified for good performance by testing with minimal data. The performance tests should be done with a data set which is representative of the production environment. One needs to load the peak expected data as per the data retention scheme employed in the performance test environment and validate the performance. The performance results can vary significantly for different size of the data incorporated in the database.

For example a particular query might be doing a full table scan and this issue might not be noticed when testing with small amounts of data. When it goes into production and there is substantial amount of data, this single query can bring down the performance of the whole solution to its knees. This single query could eat up the CPU of the database machine thereby adversely affecting all the other queries. The time to find out the root cause of this issue while the product is in production could be substantial and the customer impact could be huge. There might be need to optimize your database configuration ( SGA size, tablespace settings, redo log settings, archival settings etc.)  in order to support huge amounts of data. Some of your queries would be using complex joins and dealing with large number of records and could impact performance. All these issues could be found and resolved prior to going in production by running performance tests with production like data.

(more…)