Performance testing with data representative of production.

Lot of times applications are certified for good performance by testing with minimal data. The performance tests should be done with a data set which is representative of the production environment. One needs to load the peak expected data as per the data retention scheme employed in the performance test environment and validate the performance. The performance results can vary significantly for different size of the data incorporated in the database.

For example a particular query might be doing a full table scan and this issue might not be noticed when testing with small amounts of data. When it goes into production and there is substantial amount of data, this single query can bring down the performance of the whole solution to its knees. This single query could eat up the CPU of the database machine thereby adversely affecting all the other queries. The time to find out the root cause of this issue while the product is in production could be substantial and the customer impact could be huge. There might be need to optimize your database configuration ( SGA size, tablespace settings, redo log settings, archival settings etc.)  in order to support huge amounts of data. Some of your queries would be using complex joins and dealing with large number of records and could impact performance. All these issues could be found and resolved prior to going in production by running performance tests with production like data.

In a recent consulting work I was involved in a project where not much thought was put into data retention. There was actually no data retention strategy. The application was dealing with large amounts of data and was retaining the data in the live database and would grow to 15 TB in just a few months. There is no way we could test with this kind of data and also there was no storage immediately available to handle this kind of data. Actually there was no need of storing all this data in the live database. All that was required was an archiving strategy. The data (documents) would only be looked into if there was some issue so there was no need to keep it in live database, also there was a downstream system which would do most of the processing and it would be safe to archive the data if the processing was successful.  In this case it was an architecture issue and just required an archival strategy.

Also another challenge is to upload huge amounts of data for performance testing  especially if it is the first release and there is not production data available. There is a need to generate huge amounts of data. Should you use the performance load tool to drive the load and create the data? I would advise against that. It might take whole lot of time just to populate the database especially if you drive from the front end and go through the whole solution. If possible one should use some kind of script to directly load data in the database and bypass the whole solution and speed up the process.


Tags: , , , , , , , ,

One Response to “Performance testing with data representative of production.”

  1. Jeremy Osthoff Says:

    Usually Gideon drops his head when he gives up TDs like that. Wonder what’s up.

Leave a Reply