Support & Documentation
Follow

Is Stackify APM+ Safe for Production Servers?

Designed for Production Usage

Application performance monitoring products like Stackify APM+ are amazingly powerful for understanding the performance and behavior of your web applications. The downside is that APM solutions can slow down your applications due to their overhead. From day one we have designed Stackify APM+ to be very lightweight and safe for production servers.

Three key reasons why Stackify is designed for production usage:

  • Code profiling is minimized to key application framework methods
  • Implemented in highly optimized C++ code.
  • Data processing is done in a separate process, outside of your application code

The last point above is particularly important. Some APM solutions collect, aggregate, and upload processed data all within the same process. That same process is your IIS worker process. This has the potential to cause erratic and major performance problems in your applications. Stackify avoids this by using a separate Windows service to process the profiler output to minimize performance impact to your application.

Impact on Real-World Applications

Our experience and industry research shows that most real-world applications receive less than 30 requests per second. In applications under this level of load, as you will see below, Stackify’s impact on performance is negligible. We also see a significant number of apps that perform synchronous database calls, getting IO bound quickly; in these settings, where most of the wait time is spent on database calls, our testing again has shown Stackify’s overhead to be minimal.

Note: Each web request causes our profiler to inspect and track roughly 50-60 method calls, and that, of course, varies wildly depending on what the request does since we automatically inspect all DB calls, cache calls, etc.  

Key Metrics to Watch When Testing APM Profiler Overhead

When testing profiler overhead, there are a few metrics you want to track. One of the most important to not be overlooked is the throughput itself. With APM turned on, you could potentially see page load times not affected very much, but the actual number of requests being handled is lower. Here are the four most important metrics to measure:

  • Requests response times
  • Requests per second
  • Total requests during test (total throughput)
  • Application & server CPU usage

Comparison of Various Sites With Stackify APM+ On vs. Off

Large blog site

Large blog site

This application is an old codebase written with ASP.NET WebForms. The app itself makes a ton of database calls. The site runs on two Azure servers and has over a million monthly blog visitors. Each server receives about 7-10 requests per second and the workload varies wildly since it is a public site.

Turning Stackify APM+ on caused no noticeable overhead.

Stackify Web Services

Stackify Web Services

This application is a mix of web services written in WCF, MVC, and Web API. It handles all the communication from our agents deployed on our clients’ servers. It handles a lot of database calls as well as writing to Azure table storage and queues. The site runs on multiple Azure servers and each server receives 10-12 requests per second.

Again, turning Stackify APM+ on caused no noticeable overhead.

MVC load test with very high request volume

We also performed a load test that represents a basic MVC site receiving 100 requests per second in traffic, which is up to 10x what most basic web applications receive. You can read more about it below.

Load Testing Stackify+ APM Under Heavy Load

Configuration

For this test, we used loader.io against a single dual core server hosted on Windows Azure. For the test we hit a single URL. It is an MVC controller that returns a simple razor view and does not do any other operations. Most APM overhead is tied to the volume of methods that are inspected and instrumented. A high request volume on a simple page is the best way to do a controlled test to see APM overhead, as this removes wait time and variability for anything that might be IO bound or making boundary crossings to other servers or resources.

Stackify APM+ Results

Stackify APM+ Results

Over a 10 minute window you can see that the application provided very consistent throughput and response times with little variance. The blue line below represents the response times.

Note: These numbers are virtually identical to having APM disabled. Response times of ~80ms includes the network latency seen by loader.io. Server side times were 0-1ms.

Results From a Competitive APM Provider

Results From a Competitive APM Provider

As a comparison here is the same test while having a different leading APM vendor’s solution enabled. This APM product caused random page load time spikes every minute or so. This is because their product is engineered to aggregate and upload the APM data it collects in the same process as the IIS worker process. This can cause random thread blocking and performance issues in your app. :-(

We point this out because we want you to understand that Stackify was truly designed for speed, stability, and production usage. You can’t say that about every APM solution.

Load Test Results

Load Test Results

Stackify’s APM+ is engineered specifically to cause very little impact on response times and throughput of your application while keeping CPU overhead as low as possible.

For a server doing a very high number of requests per second (100), we consider this additional CPU overhead while maintaining excellent response times to be very good and production safe.

Conclusion

Stackify APM+ has minimal to no impact to most web applications, making it safe to run at all times on production servers. Naturally, your results may not exactly mirror our test results since all apps are different, but hopefully this in-depth look into what you can expect with Stackify APM+ enabled will give you confidence that you can trust Stackify to provide deep visibility into your application’s performance hot spots without contributing new ones!

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk