by zhu48 | July 11, 2016
Last week ended with Art going through decompiled assembly to find a bug for me, because the stack trace in the kernel debugger was pointing me to the wrong line in the C source code. It turns out that the problem was a NULL pointer dereference in the RECEIVE callback. As always in programming, a lot of effort went into catching a small oversight.
With that leftover bug from last week resolved, I moved on to my tasks for this week - more debugging, a code review done by Thomas Faber, and finally setting up WinDBG.
After some general debugging, I quickly ran into memory access violations, page faults, and other mysterious issues that I suspect had to do with locking and linked lists in my code. As a result, the kind GSoC mentors decided to give me a code review focusing on LIST_ENTRY usage.
Thomas, who performed the code review, pointed out a great deal of problems for me. The first, most glaring problem was my abuse of the global cancel spin lock (IoAcquireCancelSpinLock). This lock is used to globally serialize IRP cancellation state modifications, and, like all spin locks, should be held for the minimum amount of time necessary. Following the example set by existing code, I was acquiring this lock unnecessarily and holding it for too long. Further reading up on this lock (starting from an article given to me by Thomas) revealed that for correctness, I actually didn't have to acquire this lock in my code at all. For drivers that keep their own IRP queues, like my driver, use of this lock was deprecated. The only reason I had to explicitly acquire this lock was to catch very rare corner cases, which I am not exactly in a position to worry about yet.
Thomas also had some other suggestions for me. He caught several pieces of my code that were infinite while loops, forced me to think more about the scope of my locks, and got me to clean up my coding style.
When I was done fixing the problems Thomas caught, I did less than half a day of debugging before running into a memory access violation originating from inside lwIP. I had two choices about how to proceed - figure out how to pipe debug printouts from inside lwIP to the ReactOS debugging serial port, or actually hook up WinDBG so I can directly inspect memory. I chose to start setting up WinDBG, since doing that would probably speed up everything I do.
Setting up WinDBG turned out to be a full-day task.
To use WinDBG, I needed to use Microsoft's compiler to generate debug symbols. At first, the compiler refused to even find stdlib.h. It turned out, for some reason, my install of Visual Studio did not include the Windows 10.0 SDK, and my environment variables were set up in such a way that a mix of Windows 8.1 and Windows 10.0 SDKs were specified. A good chunk of the rest of the day was spent downloading and installing the SDK, after which WinDBG worked beautifully.
When I first dove into the kernel with WinDBG hooked up, the old bug was gone and I ran into a different bug. I guess that's the first bug I'll be work on next week.