Thursday, March 4, 2010

Fixing a Mysterious Memory Leak on ColdFusion 8

Last year I noticed a strange problem while developing the Mach-II Dashboard on an Adobe CF8 based application.   If I reloaded my entire Mach-II application through an URL argument which called the bootstrapper loadFramework() method everything was fine.  However, if I reloaded my application via the Dashboard I would see a slow memory leak of 30MB of Java heap space per reload.  Suffice it to say that in normal development, you can reload an application many times every day.  Thus this was very bad.  I do remind you that my application is fairly large (200-250MB of heap space to load up).

The only difference between the two methods of reloading was the bootstrapper method instantiated a Mach-II AppLoader CFC and replaced the current one in the application scope.  The dashboard on the other hand would reload the application by reusing the current AppLoader and calling an method on it.  I was able to replicate the memory leak using the bootstrapper by calling the same method that the Dashboard was using instead instantiating a new AppLoader CFC.

I immediately tried to replicate this memory leak on OpenBD and Railo test beds however I did not observe any memory leaks.  So the problem only existed on CF8 and probably CF7 based on another blog post posted by the great development team at Dave Ramsey.  To be fair, I personally haven't tested on CF9 but have heard reports of this from another Mach-II user.

Basically, it appears that CF8 holds on to a strong reference of some type of object in memory and is never released  from the heap for reuse.  I checked a lot stuff in our code base for accidental circular references or something strange like that, but I was unable to figure out the problem.  I even devised a stripped down application based on the Mach-II skeleton and nothing else. This was attempt #1 where I spent about four hours.  Frustrated, I went to bed.

It took a couple of months of just ignoring the problem for me to release all the frustration that I have had about this issue.  Plus, I was out of ideas even after discussing this Kurt Wiersma and Matt Woodward.  So on attempt #2, I decided that I needed to use some more high powered tools available to me.  So I fired up VisualVM which is bundled with Java 6 these days (before it was separate download).  VisualVM allows you to get detailed information about Java applications that are running in a JVM (local or remote) such as monitoring memory consumption, profile applications and take/display heap or thread dumps.  Since ColdFusion runs on Java, this is perfect tool to figure out the problem.

I'll spare you my frustrations, but about four hours in I was able to see that CF8 was holding on to a strong reference to the MachII.framework.AppLoader component which does have a circular reference to another AppLoader when Mach-II modules are in use.  However, my frustrations were not of what was the problem, but how to fix the issue.  So I Googled and Googled for blog posts and after a while I came upon this post (thanks Dave Ramsey guys - great blog name):

ColdFusion 7 Memory Leak (via Web Monkeys With Laser Beams)

So, what the hell?  If I leave off a "output" attribute on a <cffunction> there could be a memory leak?  That seems like a totally stupid bug.  So I did a code base search of Mach-II for missing "output" attributes on <cffunction> tags.  I knew there were a few because CF doesn't treat the "output" attribute like a true boolean.

1. output="true" - output everything
2. output="false - suppresses all output
3. no output attribute defined-  suppresses all output except if you have something wrapped in a <cfoutput> tag

Option number three is used in a couple of places because it is actually quite usual.  Now, this "feature" of the "output" is not documented, but is known to I guess a more advanced CFML programmers.  So I added in cfoutput="true" for all the places in all the places where we had left off the "output" attribute.  I thought I would worry about whitespace suppression later if this work.  So, restarted my instance of Tomcat and did a few reloads of my application via the Dashboard.  I took a heap space dump before I started to get a baseline and did a few more snapshots after a few reloads to see what was happening.

I think after seeing the first results I yelled some expletive along the lines of Holy Cow!  This small change was like a tourniquet on the memory leak.  It was now down to 6MB memory leak per reload.  So I knew there was something else up.  A quick search of ColdSpring showed a ton of missing "output" attributes on <cffunction> tags.  A RegEx replace of those was quickly performed.

After restarting my Tomcat instance, I repeated my analysis process of heap dumps and reloading of my application.  Viola! The memory leak was gone.  Literally, the JVM would recover all the memory -- down the almost 1kb of heap space after a full GCC was requested (which is why I wrote a "request garbage collection" feature in the Dashboard that night).

Summary: Leaving off the "output" attribute of <cffunction> tags can cause memory leaks in certain situations on CF7 and CF8 (even with CHF 3 installed which supposedly fixes memory leak issues with CFCs and memory scopes).

Solution: Never leaver off the "output" attribute!

Cases in the wild: I'm sure everything had a few of these in their code base.

I can't confirm how CF9 handles this stuff since aren't running that on any of our applications at work, but I suspect the same problem exists.  So if anybody wants check, feel free to comment.


----


Update: March 6, 2010


Because I have been asked this a few times.  The CF8 server I was running my tests against was 8.01 with CHF3 installed.


I have filed a ticket for this issue with Adobe as ColdFusion Bug 82362.

15 comments:

  1. Peter, excellent work finding this! Due to some odd behavior a long time ago I have always set the output attribute but I would have never thought to do it because of a memory leak. Has the change to M2 been added to the nightly build? How bad were things in ColdSpring?

    ReplyDelete
  2. @Brandon, the changes to Mach-II were rolled into the framework some time late last November if I believe and they are in the 1.8 gold / stable release from last month (Februrary, 2010).  As for ColdSpring, I added in a lot of 'output="false" to <cffunction> tags. Comment to Posterous said the following on 07/03/10 06:05:

    ReplyDelete
  3. Could you share the regex with us?

    ReplyDelete
  4. If you could share the regex that would be great. Our CF Apps are eating memory at the mo.

    ReplyDelete
  5. I just created a regex for it, looks like it finds stuff, at least in my eclipse file

    ^((?!output).)*

    ReplyDelete
  6. That RegEx is about what I had before.  I wish I would have saved it.

    ReplyDelete
  7. Do you know if this memory leak affects functions created in a cfscript block or is it just cffunction that is the problem?

    ReplyDelete
  8. Peter, has anybody that you know of carried forward testing on ColdFusion 9? I'm in a position to do that right now.

    ReplyDelete
  9. No, I don't believe it has been tested on CF9 yet.  I haven't had the chance, but considering weirdness I suspect it is still there. The test case is unbelievably complicated and I don't exactly have the time to rip it out of my application.  I saw the behavior on Mach-II 1.6 (it has been fixed in 1.8) using the dashboard and reloading the base application from the dashboard repeatedly.

    ReplyDelete
  10. We fixed the issue in Mach-II 1.8 so you wouldn't see the behavior on 1.8 at all.  You'd need to use 1.6

    ReplyDelete
  11. I was expecting to see the problem with other components that do not include the output attribute, just so I could tell you about the behavior of ColdFusion 9. I figured it was fixed in the version of Mach-II running.

    ReplyDelete
  12. It has to do with circular references, the output attribute value (or lack thereof), the scope in which the CFC lives and what references it had to other CFC resources.  I definitely could the circular reference issue in VisualVM and that the number of instantiated classes rose as the reloads happened. Meaning that if the Mach-II application have three modules (on top of the base app) -- there should be four AppLoaders in memory.  When the application is reloaded, those references to those AppLoaders should be released, but they weren't.  So I'd see it go from 4 -> 7 ->10 (the base AppLoader reference is never re-instantiated so the factor of 3 has to do module AppLoaders). Both Mach-II and ColdSpring do a lot of wiring of CFCs that are stashed in persistent scopes.  I think the large dependency trees are required to show the memory grow over time.  FYI, I just checked in the "output" attribute changes for ColdSpring to the CVS repo -- check out the head version.

    ReplyDelete
  13. Thanks for all the info, Peter. I appreciate it.

    ReplyDelete
  14. This issue was fixed in Cumulative Hot fix 4 for ColdFusion 8.0.1. You can download it from here:
    http://kb2.adobe.com/cps/402/kb402604.html#CF801

    ReplyDelete