How do security issues happen?

by ThFabba | February 11, 2015

If you are a Windows user, you may be used to seeing a bunch of updates pop up around the second Tuesday of every month that purport to fix "critical security issues." Microsoft's continuous effort to fix these vulnerabilities in its operating system consumes a lot of resources, but it is well worth it to the users: in a world of ever-increasing cyber security threats, no hole in our computers' defenses should be left open to potential attackers.

The recent mailing list discussion around ReactOS SVN revision 66192 shows just how easy it is to introduce a critical security issue into kernel code. I will be using this instance as an example of a simple but security-relevant bug, and to illustrate some of the steps kernel code must take to ensure the security of the system it runs on.

Spotting the vulnerability

So let's get right on it and have a look at the code in question:


NTSTATUS
APIENTRY
NtUserSetInformationThread(IN HANDLE ThreadHandle,
                           IN USERTHREADINFOCLASS ThreadInformationClass,
                           IN PVOID ThreadInformation,
                           IN ULONG ThreadInformationLength)
{
    [...]
    switch (ThreadInformationClass)
    {
        case UserThreadInitiateShutdown:
        {
            ERR("Shutdown initiated\n");

            if (ThreadInformationLength != sizeof(ULONG))
            {
                Status = STATUS_INFO_LENGTH_MISMATCH;
                break;
            }

            Status = UserInitiateShutdown(Thread, (PULONG)ThreadInformation);
            break;
        }
        [...]
}

The above is a snippet from NtUserSetInformationThread, which is a system call routine in win32k.sys that can be (more or less) directly called from a user-mode program. ThreadInformation is a pointer to some arbitrary piece of data whose meaning depends on the ThreadInformationClass parameter. In the UserThreadInitiateShutdown case, it is supposed to be a single 4-byte integer. ThreadInformationLength describes the size of this data, and as we can see, the code validates that this value is correct and fails with STATUS_INFO_LENGTH_MISMATCH if it is not. Note that both these parameters come directly from the user program, so a piece of malware calling this function would have direct control over their values.

Now let's look at what happens with ThreadInformation as it gets passed to UserInitiateShutdown:


NTSTATUS
UserInitiateShutdown(IN PETHREAD Thread,
                     IN OUT PULONG pFlags)
{
    NTSTATUS Status;
    ULONG Flags = *pFlags;
    [...]
    *pFlags = Flags;
    [...]
    /* If the caller is not Winlogon, do some security checks */
    if (PsGetThreadProcessId(Thread) != gpidLogon)
    {
        // FIXME: Play again with flags...
        *pFlags = Flags;
        [...]
    }
    [...]
    *pFlags = Flags;

    return STATUS_SUCCESS;
}

Because large parts of this function are unimplemented, all that happens is that the 4 byte value the user pointed us to gets read, and then written back unmodified a number of times.

So what's the problem, you ask?

Well, the act of dereferencing an unchecked pointer alone is enough to allow a denial-of-service (DoS) attack — a malicious program could simply shut down the system without acquiring any permission to do so. As an easy example, a program could just pass in a NULL pointer to exploit this vulnerability. This would cause UserInitiateShutdown to dereference said pointer, which would result in an instance of the infamous Blue Screen of Death, usually referred to as a "bug check" by kernel programmers. In this case the calling user even has an opportunity to write to memory, though (and remember that this is an arbitrary pointer — it can even point to kernel memory!). While writing back the same value that was previously read from this memory location may not seem too bad at first glance, it is actually quite problematic. Some memory locations change frequently, and restoring a previous value found at such a location may for example compromise the quality of entropy used for cryptographic random number generation; or you may choose to restore an old page table mapping that should have been deleted, allowing you access to more memory that may be used to compromise the system. However these are just examples I came up with within a few minutes — determined attackers may have months to figure out the best way to achieve their goals, so a seemingly small vulnerability like this may well be enough for them to steal all your secrets and gain complete control over your machine. Finally of course, if the function was fully implemented, it would actually change the Flags variable prior to writing it back, offering up the option to modify arbitrary (kernel) memory in a controlled way — a feast for any attacker.

Knowing all this, how do we fix it?

To protect against this kind of problem, the NT kernel offers two mechanisms: Probing, and Structured Exception Handling (SEH). Probing the memory fixes a large part of the problem: it ensures that the pointer we receive from the user program actually points to a user-mode address. Performing this check on all pointer parameters ensures that user-mode software does not get access to kernel-mode memory in this way. However this does not protect us from simply receiving a NULL or other invalid pointer. This is where the second technique, SEH, comes in: wrapping every access to data through a user-provided (and thus untrusted) pointer in an exception handling block ensures that the code retains control even if this pointer is invalid. The kernel-mode code provides an exception handler for this case, which gets called whenever the protected piece of code raises an exception (such as an access violation due to the use of an invalid pointer). The exception handler collects available information (such as an exception code), performs any necessary clean-up and usually simply returns to the user with a failure code.

Let's have a look at the fixed code (as of r66223):


            ULONG CapturedFlags = 0;

            ERR("Shutdown initiated\n");

            if (ThreadInformationLength != sizeof(ULONG))
            {
                Status = STATUS_INFO_LENGTH_MISMATCH;
                break;
            }

            /* Capture the caller value */
            Status = STATUS_SUCCESS;
            _SEH2_TRY
            {
                ProbeForWrite(ThreadInformation, sizeof(CapturedFlags), sizeof(PVOID));
                CapturedFlags = *(PULONG)ThreadInformation;
            }
            _SEH2_EXCEPT(EXCEPTION_EXECUTE_HANDLER)
            {
                Status = _SEH2_GetExceptionCode();
            }
            _SEH2_END;

            if (NT_SUCCESS(Status))
                Status = UserInitiateShutdown(Thread, &CapturedFlags);

            /* Return the modified value to the caller */
            _SEH2_TRY
            {
                *(PULONG)ThreadInformation = CapturedFlags;
            }
            _SEH2_EXCEPT(EXCEPTION_EXECUTE_HANDLER)
            {
                Status = _SEH2_GetExceptionCode();
            }
            _SEH2_END;

Notice that all accesses to the untrusted ThreadInformation pointer are now performed inside _SEH2_TRY blocks. Exceptions occurring inside these blocks will be handled in a controlled manner by the code in the _SEH2_EXCEPT block. Additionally, ProbeForWrite is called before the pointer is dereferenced for the first time; this will raise a STATUS_ACCESS_VIOLATION or STATUS_DATATYPE_MISALIGNMENT exception if it detects an invalid (e.g. kernel-mode) pointer or non-writable memory. Finally, note the use of a "CapturedFlags" variable that's being passed to UserInitiateShutdown. This is a trick that simplifies handling of the untrusted pointer: instead of having to use SEH every single time pFlags is accessed inside the function, the value is saved to a trusted location by NtUserSetInformationThread, and written back to user memory after UserInitiateShutdown returns. This way, there is no need to modify UserInitiateShutdown itself, since its argument is now a trusted kernel-mode pointer (to CapturedFlags). As a result of applying these measures, this case of the function can now handle arbitrary valid or invalid user input without adverse effects on the system. Mission accomplished!

What should we learn from this?

As we've seen, with a watchful eye early on in development, it is possible to spot code that may prove to be a severe security issue later on. We can't afford to have too many of those, because to be honest, you can be sure we'll have enough security problems anyway — in fact, if all goes well, we'll go hunting after them in the future, and providing regular fixes just like you see in Windows Update every month.

As an interesting side note, Alex Ionescu pointed out that Windows has been shown to have a vulnerability in the very same function, NtUserSetInformationThread. According to Alex, this one is still unfixed, and is used for example to jailbreak devices such as the Surface RT. It was described in 2012 by well-known security researcher Mateusz "j00ru" Jurczyk (who likes to hang out in our IRC channel, too ;]). You can find his blog entry on this subject at http://j00ru.vexillium.org/?p=1393

Discussion: https://reactos.org/forum/viewtopic.php?f=2&t=13999