After I finally managed to write last week’s report (have you ever tried to write something, and the words wouldn’t come out at all?), I continued looking at those nasty menu leaks.
In an attempt to make the task simpler, I started looking for software with the ability to detect USER handle leaks. I checked the free (libre) alternatives first, but after not much luck I moved on to free (beer) software, and to fully commercial and proprietary software, when I finally found Deleaker.
Deleaker is a small program that hooks into an executable (lets you choose which DLLs to include in the hooking), and tracks 4 types of resources: memory allocations, Kernel handles, USER handles, and GDI handles.
In Windows, the experience was flawless. I chose for it to track USER leaks, ran filebrowser.exe, terminated it after a few navigations, and then it gathered the data and told me exactly which functions were leaking.
The result was a big clue: all the leaking menus were coming exactly from the same call to one specific function: Shell_MergeMenus. While it’s perfectly possible that this function is bugged in ReactOS, I was running the code in Windows, so the easier explanation was that no one was calling DestroyMenu on the menubar menu. So after deciding on the best place to destroy the menu, I coded the call and re-ran the tests, and in Windows the navigation leaks had gone from 20 each, down to 1 or 2! And I wasn’t even sure if those were actually leaking! I quickly went to try this in ReactOS, but it turns out Deleaker isn’t able to install correctly, so I don’t know if it would have worked at all. I did measure the number of handles though, and noticed that instead of growing by 60 it now just grows by 20, which is still bad, but a reasonable improvement.
While I was bashing my head against a wall before, I had started looking into why the filebrowser process doesn’t want to die when the window closes. I noticed that other threads remain active, which causes the process to keep living. Those threads seemed to point to shell extensions, which made me start thinking of how bad we are at releasing everything. So I went to add some debug code to track the refcounts of the COM objects when window is closed, and I noticed we didn’t even try to cleanup at all! The whole filebrowser class was left hanging there for eternity, with all the child components also hanging from it. That’s bad.
I added code to destroy the objects, and to track how many other references were keeping them alive, and when preparing other related classes similarly, I realized just how poorly our shell classes handle the whole cleanup code flow. A lot of the explorer-related classes had (and many still do) circular references, where one object keeps another alive, and this other object also keeps the first alive. In many cases, none of the involved objects made any effort to handle the close/destroy cases, with the necessary functions unimplemented. I tried to fix some of them, but others resisted me. I moved back into the leaks world when Amine messaged me saying he had been trying to help find leaks in the shell, but running the filebrowser in Windows always crashed for him.
Over the weekend, we painstakingly tracked the issue to a mismatch in calling convention to an exported function which was corrupting the stack pointer, and causing functions to overwrite a stack variable that shouldn’t have changed at all. This did fix the issue for Amine, but then he got another crash, that I haven’t been able to reproduce and may or may not be fixed by using VS2010 SP1 instead of the original 2010. I decided to install the original 2010 in a virtual machine, which I’ll use to compare the generated code and see if I find the culprit.
Since there are no visual changes to show, I decided to take a few screenshots of how Deleaker looks like:
The shell32.dll!Ordinal681 function is Shell_MergeMenus. This one menu leak may or may not be related to the issues with cleaning up when the window closes, so I can't blame it on it being an actual leak.