InfoQ Homepage Articles The Service and the Beast: Building a Windows Service that Does Not Fail to Restart

The Service and the Beast: Building a Windows Service that Does Not Fail to Restart

Dec 28, 2022 20 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Windows Services play a key role in the Microsoft Windows operating system, and support the creation and management of long-running processes.
Service Isolation is important and powerful, however, when the service needs to interact with the user’s space, isolation makes things harder, but you can manage this.
Services are ideal for use alongside a watchdog mechanism. Such a mechanism will ensure a given application is always running, and in case it shuts down abnormally, it will restart.
A good logging mechanism is always useful during development, using either a simple, or, when needed, a complex logging tool.
Testing the final solution is imperative. Once the code is checked and verified to work, up to 2% of testers might still report bugs, which is within reason.

When programming C++ for Windows, working with Windows Services is almost inevitable. Windows Services play a key role in the Microsoft Windows operating system, and support the creation and management of long-running processes that survive sleep, hibernate, restart and shutting down. But what happens if they don’t? The inability to restart a service after shutting down the PC when Fast Startup is checked can result in a program catastrophe. Service Isolation, introduced by Microsoft in Windows Vista, can cause this type of havoc - and here’s how you can solve it.

Thank You for Your Service …

We’ve been working with Windows Services for years, yet it seems that no matter how much we think we know about Services, or how much we believe we can handle them, we keep encountering more problems, challenges and issues. Some of these issues are undocumented or, if we’re "lucky", they are poorly documented.

Ever since Service Isolation was introduced by Microsoft, one of the most annoying problems we’ve encountered is the inability to restart a service after shutting down the PC when Fast Startup is checked. As we could not find a solution, we decided to roll up our sleeves and created one ourselves, which led to the development of a persistent Service.

But before we dive deeper and explain more about our solution, let’s start with the basics and explain what Services are and why we even need to use Windows Services in the first place.

NT Service (also known as Windows Service) is the term given to a special process which is loaded by the Service Control Manager of the NT kernel, and runs in the background right after Windows starts (before users log on). We use services to perform core and low-level OS tasks, such as Web serving, event logging, file serving, help and support, printing, cryptography, and error reporting.

Additionally, services enable us to create executable, long-running applications. The reason is that a Service runs in its own Windows session environment, so it does not interfere with other components or sessions of your application. Obviously, Services are expected to start automatically once the computer boots – and we’ll get to that in a minute.

Moving further, the obvious question is - why do we need persistent Services? The answer is pretty clear, a service is able to:

run; continuously in the background
invoke itself under the logged-in user’s session;
act as a watchdog and make sure a given application is always running.

A Windows Service needs to survive sleep, hibernate, restart and shutdown. However, as explained, there are specific and dangerous issues when "Fast Startup" is checked and the PC is turned off and on again. In most of these cases, the service failed to restart.

Since we were developing an Anti-Virus, which is supposed to restart after reboot or shutdown, this issue created a serious problem which we were eager to solve.

Stay! Good Service …

To create the near perfect persistent Windows service, we had to solve several underlying issues first.

One of those issues was related to Service Isolation - the isolated Service can’t access any context associated with any specific user. One of our software products used to store data in c:\users\<USER NAME>\appdata\local\ but when it ran from our service, the path was invalid since the service runs from Session 0. Moreover, after reboot, the Service starts before any user logs in, which leads to the first piece of the solution: waiting for the user to log in.

To figure out how to do this, we posted our question here.

This turned out to be a problem with no perfect solution, however, the code that accompanied this article has been used and fully tested with no issues.

The Basics

The structure and the flow of our code may look complex, and that is for a reason. Over the last 10 years, Services have become isolated from other processes. Since then, Windows Services operate under the SYSTEM user account as opposed to any other user account, and run isolated.

The reason for the isolation is because services are powerful and can be a potential security risk. Because of that, Microsoft introduced service isolation. Before that change, all services ran in Session 0 along with applications.

However, after the introduction of isolation, which took place with Windows Vista, things have changed.
The idea behind our code was to have the Windows Service launch itself as a user, by calling CreateProcessAsUserW, as will be explained further.
Our Service, named SG_RevealerService, has several commands and when called using the following command line parameters, it acts accordingly.

#define SERVICE_COMMAND_INSTALL L"Install"             // The command line argument
                                                       // for installing the service

#define SERVICE_COMMAND_LAUNCHER L"ServiceIsLauncher"  // Launcher command for
                                                       // NT service

When calling SG_RevealerService, there are three options:

Option 1 - called without any command line argument - nothing will happen.

Option 2 - called with the Install command line argument. In this case, the service will install itself and if a valid executable path is added after a hash (#) separator, it will start, and the Windows watchdog will keep it running.

The Service then runs itself using CreateProcessAsUserW(), and the new process runs under the user account. This gives the Service the ability to access the context that the calling instance has no access to due to Service Isolation.

Option 3 - called with the ServiceIsLauncher command line argument. The service client main application will start. At this point, the entry function indicates that the service has started itself with the current user's privileges. At this point, you can see 2 instances of SG_RevealerService in the Task Manager: one under SYSTEM, and the other under the currently logged-in user.

/*
RunHost
*/

BOOL RunHost(LPWSTR HostExePath,LPWSTR CommandLineArguments)
{
    WriteToLog(L"RunHost '%s'",HostExePath);

    STARTUPINFO startupInfo = {};
    startupInfo.cb = sizeof(STARTUPINFO);
    startupInfo.lpDesktop = (LPTSTR)_T("winsta0\\default");

    HANDLE hToken = 0;
    BOOL bRes = FALSE;

    LPVOID pEnv = NULL;
    CreateEnvironmentBlock(&pEnv, hToken, TRUE);

    PROCESS_INFORMATION processInfoAgent = {};
    PROCESS_INFORMATION processInfoHideProcess = {};
    PROCESS_INFORMATION processInfoHideProcess32 = {};

    if (PathFileExists(HostExePath))
    {
        std::wstring commandLine;
        commandLine.reserve(1024);

        commandLine += L"\"";
        commandLine += HostExePath;
        commandLine += L"\" \"";
        commandLine += CommandLineArguments;
        commandLine += L"\"";

        WriteToLog(L"launch host with CreateProcessAsUser ...  %s",
                     commandLine.c_str());

        bRes = CreateProcessAsUserW(hToken, NULL, &commandLine[0],
               NULL, NULL, FALSE, NORMAL_PRIORITY_CLASS |
               CREATE_UNICODE_ENVIRONMENT | CREATE_NEW_CONSOLE |
               CREATE_DEFAULT_ERROR_MODE, pEnv,
            NULL, &startupInfo, &processInfoAgent);
        if (bRes == FALSE)
        {
            DWORD   dwLastError = ::GetLastError();
            TCHAR   lpBuffer[256] = _T("?");
            if (dwLastError != 0)    // Don't want to see an
                                     // "operation done successfully" error ;-)
            {
                ::FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM,    // It's a system error
                    NULL,                                      // No string to be
                                                               // formatted needed
                    dwLastError,                               // Hey Windows: Please
                                                               // explain this error!
                    MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), // Do it in the standard
                                                               // language
                    lpBuffer,              // Put the message here
                    255,                   // Number of bytes to store the message
                    NULL);
            }
            WriteToLog(L"CreateProcessAsUser failed - Command Line = %s Error : %s",
                         commandLine, lpBuffer);
        }
        else
        {
            if (!writeStringInRegistry(HKEY_LOCAL_MACHINE,
               (PWCHAR)SERVICE_REG_KEY, (PWCHAR)SERVICE_KEY_NAME, HostExePath))
            {
                WriteToLog(L"Failed to write registry");
            }
        }
    }
    else
    {
        WriteToLog(L"RunHost failed because path '%s' does not exists", HostExePath);
    }
    hPrevAppProcess = processInfoAgent.hProcess;
    
    CloseHandle(hToken);
    WriteToLog(L"Run host end!");

    return bRes;
}

Detecting User Log On

The first challenge is to start some of the actions only when and if a user logs in.
In order to detect a user log on, we first define a global variable.

bool g_bLoggedIn = false;

It will be set to true when a user logs in.

Subscribing to the Logon event

We defined the following Preprocesor Directives:

#define EVENT_SUBSCRIBE_PATH    L"Security"
#define EVENT_SUBSCRIBE_QUERY    L"Event/System[EventID=4624]"

After the Service starts, we subscribe to the logon event, so the moment a user has logged in, we get an alert via the callback function we have set, and we can continue.
To implement this, we need a class to handle the creation of the subscription and wait for the event callback.

class UserLoginListner
{
    HANDLE hWait = NULL;
    HANDLE hSubscription = NULL;

public:
    ~UserLoginListner()
    {
        CloseHandle(hWait);
        EvtClose(hSubscription);
    }

    UserLoginListner()
    {
        const wchar_t* pwsPath = EVENT_SUBSCRIBE_PATH;
        const wchar_t* pwsQuery = EVENT_SUBSCRIBE_QUERY;

        hWait = CreateEvent(NULL, FALSE, FALSE, NULL);

        hSubscription = EvtSubscribe(NULL, NULL,
            pwsPath, pwsQuery,
            NULL,
            hWait,
            (EVT_SUBSCRIBE_CALLBACK)UserLoginListner::SubscriptionCallback,
            EvtSubscribeToFutureEvents);
        if (hSubscription == NULL)
        {
            DWORD status = GetLastError();

            if (ERROR_EVT_CHANNEL_NOT_FOUND == status)
                WriteToLog(L"Channel %s was not found.\n", pwsPath);
            else if (ERROR_EVT_INVALID_QUERY == status)
                WriteToLog(L"The query \"%s\" is not valid.\n", pwsQuery);
            else
                WriteToLog(L"EvtSubscribe failed with %lu.\n", status);

            CloseHandle(hWait);
        }
    }

Next we need a function for the wait itself:

void WaitForUserToLogIn()
{
    WriteToLog(L"Waiting for a user to log in...");
    WaitForSingleObject(hWait, INFINITE);
    WriteToLog(L"Received a Logon event - a user has logged in");
}

We also need a callback function:

static DWORD WINAPI SubscriptionCallback(EVT_SUBSCRIBE_NOTIFY_ACTION action, PVOID
       pContext, EVT_HANDLE hEvent)
{
    if (action == EvtSubscribeActionDeliver)
    {
        WriteToLog(L"SubscriptionCallback invoked.");
        HANDLE Handle = (HANDLE)(LONG_PTR)pContext;
        SetEvent(Handle);
    }

    return ERROR_SUCCESS;
}

Then, all we need to do is add a block of code with the following lines:

WriteToLog(L"Launch client\n"); // launch client ...
{
    UserLoginListner WaitTillAUserLogins;
    WaitTillAUserLogins.WaitForUserToLogIn();
}

Once we reach the end of this block, we can be assured that a user has logged in.

Later in the article, we will explain how to retrieve the account/username of the logged-in user and how to use my GetLoggedInUser() function.

It’s Not You, It’s Me: Impersonating a User

When we know for sure that a user has logged in, we need to impersonate them.

The following function does the job. Not only does it impersonate the user, it also calls CreateProcessAsUserW() and runs itself as that user.
By doing so, we give the service access to the user’s context, including documents, desktop, etc. and allow the service to use the UI, which isn’t possible for a service running from Session 0.

CreateProcessAsUserW creates a new process along with its primary thread, which will run in the context of a given user.

//Function to run a process as active user from Windows service
void ImpersonateActiveUserAndRun()
{
    DWORD session_id = -1;
    DWORD session_count = 0;
    WTS_SESSION_INFOW *pSession = NULL;

    if (WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE, 0, 1, &pSession, &session_count))
    {
        WriteToLog(L"WTSEnumerateSessions - success");
    }
    else
    {
        WriteToLog(L"WTSEnumerateSessions - failed. Error %d",GetLastError());
        return;
    }
    TCHAR szCurModule[MAX_PATH] = { 0 };

    GetModuleFileName(NULL, szCurModule, MAX_PATH);


    for (size_t i = 0; i < session_count; i++)
    {
        session_id = pSession[i].SessionId;
        WTS_CONNECTSTATE_CLASS wts_connect_state = WTSDisconnected;
        WTS_CONNECTSTATE_CLASS* ptr_wts_connect_state = NULL;
        DWORD bytes_returned = 0;
        if (::WTSQuerySessionInformation(
            WTS_CURRENT_SERVER_HANDLE,
            session_id,
            WTSConnectState,
            reinterpret_cast<LPTSTR*>(&ptr_wts_connect_state),
            &bytes_returned))
        {
            wts_connect_state = *ptr_wts_connect_state;
            ::WTSFreeMemory(ptr_wts_connect_state);
            if (wts_connect_state != WTSActive) continue;
        }
        else
        {
            continue;
        }

        HANDLE hImpersonationToken;
        if (!WTSQueryUserToken(session_id, &hImpersonationToken))
        {
            continue;
        }

        //Get the actual token from impersonation one
        DWORD neededSize1 = 0;
        HANDLE *realToken = new HANDLE;
        if (GetTokenInformation(hImpersonationToken, (::TOKEN_INFORMATION_CLASS) TokenLinkedToken, realToken, sizeof(HANDLE), &neededSize1))
        {
            CloseHandle(hImpersonationToken);
            hImpersonationToken = *realToken;
        }
        else
        {
            continue;
        }
        HANDLE hUserToken;
        if (!DuplicateTokenEx(hImpersonationToken,
            TOKEN_ASSIGN_PRIMARY | TOKEN_ALL_ACCESS | MAXIMUM_ALLOWED,
            NULL,
            SecurityImpersonation,
            TokenPrimary,
            &hUserToken))
        {
            continue;
        }


        // Get user name of this process
        WCHAR* pUserName;
        DWORD user_name_len = 0;
        if (WTSQuerySessionInformationW(WTS_CURRENT_SERVER_HANDLE, session_id, WTSUserName, &pUserName, &user_name_len))
        {
            //Now we got the user name stored in pUserName
        }
        // Free allocated memory                         
        if (pUserName) WTSFreeMemory(pUserName);
        ImpersonateLoggedOnUser(hUserToken);
        STARTUPINFOW StartupInfo;
        GetStartupInfoW(&StartupInfo);
        StartupInfo.cb = sizeof(STARTUPINFOW);
        PROCESS_INFORMATION processInfo;
        SECURITY_ATTRIBUTES Security1;
        Security1.nLength = sizeof SECURITY_ATTRIBUTES;
        SECURITY_ATTRIBUTES Security2;
        Security2.nLength = sizeof SECURITY_ATTRIBUTES;
        void* lpEnvironment = NULL;

        // Obtain all needed necessary environment variables of the logged in user.
        // They will then be passed to the new process we create.

        BOOL resultEnv = CreateEnvironmentBlock(&lpEnvironment, hUserToken, FALSE);
        if (!resultEnv)
        {
            WriteToLog(L"CreateEnvironmentBlock - failed. Error %d",GetLastError());
            continue;
        }
        std::wstring commandLine;
        commandLine.reserve(1024);
        commandLine += L"\"";
        commandLine += szCurModule;
        commandLine += L"\" \"";
        commandLine += SERVICE_COMMAND_Launcher;
        commandLine += L"\"";
        WCHAR PP[1024]; //path and parameters
        ZeroMemory(PP, 1024 * sizeof WCHAR);
        wcscpy_s(PP, commandLine.c_str());

        // Next we impersonate - by starting the process as if the current logged in user, has started it
        BOOL result = CreateProcessAsUserW(hUserToken,
            NULL,
            PP,
            NULL,
            NULL,
            FALSE,
            NORMAL_PRIORITY_CLASS | CREATE_NEW_CONSOLE,
            NULL,
            NULL,
            &StartupInfo,
            &processInfo);

        if (!result)
        {
            WriteToLog(L"CreateProcessAsUser - failed. Error %d",GetLastError());
        }
        else
        {
            WriteToLog(L"CreateProcessAsUser - success");
        }
        DestroyEnvironmentBlock(lpEnvironment);
        CloseHandle(hImpersonationToken);
        CloseHandle(hUserToken);
        CloseHandle(realToken);
        RevertToSelf();
    }
    WTSFreeMemory(pSession);
}

Finding the Logged-In User

In order to find the logged-in user's account name, we use the following function:

std::wstring GetLoggedInUser()
{
    std::wstring user{L""};
    WTS_SESSION_INFO *SessionInfo;
    unsigned long SessionCount;
    unsigned long ActiveSessionId = -1;

    if(WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE,
                            0, 1, &SessionInfo, &SessionCount))
    {
        for (size_t i = 0; i < SessionCount; i++)
        {
            if (SessionInfo[i].State == WTSActive ||
                SessionInfo[i].State == WTSConnected)
            {
                ActiveSessionId = SessionInfo[i].SessionId;
                break;
            }
        }

        wchar_t *UserName;
        if (ActiveSessionId != -1)
        {
            unsigned long BytesReturned;
            if (WTSQuerySessionInformation(WTS_CURRENT_SERVER_HANDLE,
                ActiveSessionId, WTSUserName, &UserName, &BytesReturned))
            {
                user = UserName;        // Now we have the logged in user name
                WTSFreeMemory(UserName);    
            }
        }
        WTSFreeMemory(SessionInfo);
    }
    return user;
}

We use this function soon after the Service kicks in. As long as there is no user logged in, this function returns an empty string, and while it does, we know we should wait.

A Watchdog Is a Service's Best Friend

Services are ideal for use along with a Watchdog mechanism.

Such a mechanism will ensure a given application is always running, and in case it shuts down abnormally, it will restart it. We always need to remember that the user may just select Quit, and in such case, we don’t want to restart the process. However if the process is stopped via the Task Manager, or by any other means, we would want to restart it. A good example would be an AntiVirus program. We want to make sure that malware is not able to terminate the Anti Virus that is supposed to detect it.

To achieve that, we need the Service to provide some sort of an API to the program using it, so when the user of that program selects "Quit", the program informs the Service that its job is done, and it can uninstall itself.

Some Building Blocks

Next, we will explain some building blocks that are required to understand the code in this article.

GetExePath

In order to obtain the path of our Service, or any executable, the following function will be handy.

/**
 * GetExePath() - returns the full path of the current executable.
 *
 * @param values - none.
 * @return a std::wstring containing the full path of the current executable.
 */
std::wstring GetExePath()
{
    wchar_t buffer[65536];
    GetModuleFileName(NULL, buffer, sizeof(buffer) / sizeof(*buffer));
    int pos = -1;
    int index = 0;
    while (buffer[index])
    {
        if (buffer[index] == L'\\' || buffer[index] == L'/')
        {
            pos = index;
        }
        index++;
    }
    buffer[pos + 1] = 0;
    return buffer;
}

WriteLogFile

When developing a Windows Service, (and any software, for that matter), it’s important to have a logging mechanism. We have a very complex logging mechanism, but for the purposes of this article, I added the minimal logging function named WriteToLog. It works like printf but everything sent to it is not only formatted but also stored in a log file, which can later be checked. This log file grows, as new log entries append to it.

The path of the log file, would normally be the path of the Service's EXE, however, due to Service Isolation, for a short while after rebooting the PC, this path will change to c:\Windows\System32 and we don't want that. So our log function checks for the path of our exe and does not assume the Current Directory will remain the same throughout the lifecycle of the Service.

/**
 * WriteToLog() - writes formatted text into a log file, and on screen (console)
 *
 * @param values - formatted text, such as L"The result is %d",result.
 * @return - none
 */
void WriteToLog(LPCTSTR lpText, ...)
{
    FILE *fp;
    wchar_t log_file[MAX_PATH]{L""};
    if(wcscmp(log_file,L"") == NULL)
    {
        wcscpy(log_file,GetExePath().c_str());
        wcscat(log_file,L"log.txt");
    }
    // find gmt time, and store in buf_time
    time_t rawtime;
    struct tm* ptm;
    wchar_t buf_time[DATETIME_BUFFER_SIZE];
    time(&rawtime);
    ptm = gmtime(&rawtime);
    wcsftime(buf_time, sizeof(buf_time) / sizeof(*buf_time), L"%d.%m.%Y %H:%M", ptm);

    // store passed messsage (lpText) to buffer_in
    wchar_t buffer_in[BUFFER_SIZE];

    va_list ptr;
    va_start(ptr, lpText);

    vswprintf(buffer_in, BUFFER_SIZE, lpText, ptr);
    va_end(ptr);

    // store output message to buffer_out - enabled multiple parameters in swprintf
    wchar_t buffer_out[BUFFER_SIZE];

    swprintf(buffer_out, BUFFER_SIZE, L"%s %s\n", buf_time, buffer_in);

    _wfopen_s(&fp, log_file, L"a,ccs=UTF-8");
    if (fp)
    {
        fwprintf(fp, L"%s\n", buffer_out);
        fclose(fp);
    }
    wcscat(buffer_out,L"\n");HANDLE stdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    if (stdOut != NULL && stdOut != INVALID_HANDLE_VALUE)
    {
        DWORD written = 0;
        WriteConsole(stdOut, buffer_out, wcslen(buffer_out), &written, NULL);
    }
}

More Building Blocks - Registry Stuff

Here are some functions we use to store the watchdog executable's path, so when the Service restarts after a PC restart or reboot, it will have that path available.

BOOL CreateRegistryKey(HKEY hKeyParent, PWCHAR subkey)
{
    DWORD dwDisposition; //Verify new key is created or open existing key
    HKEY  hKey;
    DWORD Ret;
    Ret =
        RegCreateKeyEx(
            hKeyParent,
            subkey,
            0,
            NULL,
            REG_OPTION_NON_VOLATILE,
            KEY_ALL_ACCESS,
            NULL,
            &hKey,
            &dwDisposition);
    if (Ret != ERROR_SUCCESS)
    {
        WriteToLog(L"Error opening or creating new key\n");
        return FALSE;
    }
    RegCloseKey(hKey); //close the key
    return TRUE;
}

BOOL writeStringInRegistry(HKEY hKeyParent, PWCHAR subkey,
                           PWCHAR valueName, PWCHAR strData)
{
    DWORD Ret;
    HKEY hKey;
    //Check if the registry exists
    Ret = RegOpenKeyEx(
        hKeyParent,
        subkey,
        0,
        KEY_WRITE,
        &hKey
    );
    if (Ret == ERROR_SUCCESS)
    {
        if (ERROR_SUCCESS !=
            RegSetValueEx(
                hKey,
                valueName,
                0,
                REG_SZ,
                (LPBYTE)(strData),
                ((((DWORD)lstrlen(strData) + 1)) * 2)))
        {
            RegCloseKey(hKey);
            return FALSE;
        }
        RegCloseKey(hKey);
        return TRUE;
    }
    return FALSE;
}

LONG GetStringRegKey(HKEY hKey, const std::wstring &strValueName,
                     std::wstring &strValue, const std::wstring &strDefaultValue)
{
    strValue = strDefaultValue;
    TCHAR szBuffer[MAX_PATH];
    DWORD dwBufferSize = sizeof(szBuffer);
    ULONG nError;
    nError = RegQueryValueEx(hKey, strValueName.c_str(), 0, NULL,
             (LPBYTE)szBuffer, &dwBufferSize);
    if (nError == ERROR_SUCCESS)
    {
        strValue = szBuffer;
        if (strValue.front() == _T('"') && strValue.back() == _T('"'))
        {
            strValue.erase(0, 1); // erase the first character
            strValue.erase(strValue.size() - 1); // erase the last character
        }
    }
    return nError;
}

BOOL readStringFromRegistry(HKEY hKeyParent, PWCHAR subkey,
                            PWCHAR valueName, std::wstring& readData)
{
    HKEY hKey;
    DWORD len = 1024;
    DWORD readDataLen = len;
    PWCHAR readBuffer = (PWCHAR)malloc(sizeof(PWCHAR) * len);
    if (readBuffer == NULL)
        return FALSE;
    //Check if the registry exists
    DWORD Ret = RegOpenKeyEx(
        hKeyParent,
        subkey,
        0,
        KEY_READ,
        &hKey
    );
    if (Ret == ERROR_SUCCESS)
    {
        Ret = RegQueryValueEx(
            hKey,
            valueName,
            NULL,
            NULL,
            (BYTE*)readBuffer,
            &readDataLen
        );
        while (Ret == ERROR_MORE_DATA)
        {
            // Get a buffer that is big enough.
            len += 1024;
            readBuffer = (PWCHAR)realloc(readBuffer, len);
            readDataLen = len;
            Ret = RegQueryValueEx(
                hKey,
                valueName,
                NULL,
                NULL,
                (BYTE*)readBuffer,
                &readDataLen
            );
        }
        if (Ret != ERROR_SUCCESS)
        {
            RegCloseKey(hKey);
            return false;;
        }
        readData = readBuffer;
        RegCloseKey(hKey);
        return true;
    }
    else
    {
        return false;
    }
}

Checking If Our Host Is Running

One key ability of the program in this article is to guard our SampleApp (which we call "the host"), and when it’s not running, restart it (hence the watchdog name). In real life, we would check if the host was terminated by the user, which is OK, or terminated by some malware (which isn't OK), and in the latter case, restart it (otherwise, the user will select Quit, but the App would continue to "haunt" the system and be executed again and again).

Here is how it's done:

We create a Timer event and every given amount of time (shouldn't be too frequent) we check if the host's process is running, and if it isn’t, we start it. We use a static boolean flag (is_running) which is used to indicate that we are already in this block of code, so it won't be called while already being handled. This is something I always do in WM_TIMER code blocks, because, when a timer is set at too high a frequency, the code block may be called while the code from previous WM_TIMER event is still being executed).

We also check if a user is logged in by examining the g_bLoggedIn boolean flag.

        case WM_TIMER:
        {
            if (is_running) break;
            WriteToLog(L"Timer event");
            is_running = true;
            HANDLE hProcessSnap;
            PROCESSENTRY32 pe32;
            bool found{ false };

            WriteToLog(L"Enumerating all processess...");
            // Take a snapshot of all processes in the system.
            hProcessSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
            if (hProcessSnap == INVALID_HANDLE_VALUE)
            {
                WriteToLog(L"Failed to call CreateToolhelp32Snapshot(). Error code %d",GetLastError());
                is_running = false;
                return 1;
            }

            // Set the size of the structure before using it.
            pe32.dwSize = sizeof(PROCESSENTRY32);

            // Retrieve information about the first process,
            // and exit if unsuccessful
            if (!Process32First(hProcessSnap, &pe32))
            {
                WriteToLog(L"Failed to call Process32First(). Error code %d",GetLastError());
                CloseHandle(hProcessSnap);          // clean the snapshot object
                is_running=false;
                break;
            }

            // Now walk the snapshot of processes, and
            // display information about each process in turn
            DWORD svchost_parent_pid = 0;
            DWORD dllhost_parent_pid = 0;
            std::wstring szPath = L"";

            if (readStringFromRegistry(HKEY_LOCAL_MACHINE, (PWCHAR)SERVICE_REG_KEY, (PWCHAR)SERVICE_KEY_NAME, szPath))
            {
                m_szExeToFind = szPath.substr(szPath.find_last_of(L"/\\") + 1);    // The process name is the executable name only
                m_szExeToRun = szPath;                                            // The executable to run is the full path
            }
            else
            {
                WriteToLog(L"Error reading ExeToFind from the Registry");
            }

            do
            {
                if (wcsstr( m_szExeToFind.c_str(), pe32.szExeFile))
                {
                    WriteToLog(L"%s is running",m_szExeToFind.c_str());
                    found = true;
                    is_running=false;
                    break;
                }
                if (!g_bLoggedIn)
                {
                    WriteToLog(L"WatchDog isn't starting '%s' because user isn't logged in",m_szExeToFind.c_str());
                    return 1;
                }
            }
            while (Process32Next(hProcessSnap, &pe32));
            if (!found)
            {
                WriteToLog(L"'%s' is not running. Need to start it",m_szExeToFind.c_str());
                if (!m_szExeToRun.empty())    // watchdog start the host app
                {
                    if (!g_bLoggedIn)
                    {
                        WriteToLog(L"WatchDog isn't starting '%s' because user isn't logged in",m_szExeToFind.c_str());
                        return 1;
                    }
                    ImpersonateActiveUserAndRun();

                    RunHost((LPWSTR)m_szExeToRun.c_str(), (LPWSTR)L"");

                }
                else
                {
                    WriteToLog(L"m_szExeToRun is empty");
                }
            }
            CloseHandle(hProcessSnap);
        }
        is_running=false;
        break;

How to Test the Service

When we wanted to test the solution, we hired 20 qualified and cooperative testers. Throughout the progress of work, more and more tests succeeded. At some point, it worked perfectly on our own Surface Pro laptops, but luckily, one of our employees reported that on his PC, after shutting it down, the service wasn’t coming up again, or came up but without starting itself under Ring 3. That’s good news, as during development, when you suspect a bug, the worst news is not to find it and not to be able to reproduce it. All in all, 10% of the testers reported a problem. So the version posted here works perfectly on our employee's PC, however 2% of the testers still report problems from time to time. In other words, SampleApp doesn't start after shutting down the PC and turning it on.

Here are instructions for testing the service and the watchdog.

The SampleApp

We have included a sample application generated by the Visual Studio Wizard, as the "host" application that will be kept running by the watchdog. You can run it on its own and it should show up like in the image below. This application doesn't do much. In fact, it doesn't do anything ...

In the following section, we will provide the instructions for testing the service and the watchdog. You can download the source code from GitHub.

Running from CMD

Open CMD as an Administrator. Change the current directory to where the Service's EXE resides and type:

SG_RevealerService.exe Install#SampleApp.exe

As you can see, we have two elements:

The command, which is Install
The argument, which is attached to the command through a hash (#) and should be any executable you want your watchdog to watch.

The Service will first start SampleApp, and from that moment, if you try to terminate or kill the SampleApp, the watchdog will restart it after a few seconds. If you reboot, turn the PC off and on again,you will see if the Service comes back and starts SampleApp again. That sums up the goal and functionality of our Service.

Uninstalling

Finally, to stop and uninstall the service we have included the uninstall.bat script, which goes like this:

sc stop sg_revealerservice
sc delete sg_revealerservice
taskkill /f /im sampleapp.exe
taskkill /f /im sg_revealerservice.exe

Conclusion:

Windows Services play a key role in the Microsoft Windows operating system, and support the creation and management of long-running processes.
In some cases, when "Fast Startup" is checked, and the PC is started after a regular shutdown, services tend to fail to restart.
The aim of this article is to create a persistent service that will always run and restart after Windows restarts, or after shutdown.
One of the main issues relates to Service Isolation. The isolation itself (which was introduced in Windows Vista) is important and powerful, however, when we need to interact with the user’s space, that creates some limitations.
When a service restarts, we want it to interact with the user’s space, however it can’t be too early (before any user logs in). You can solve this problem though by subscribing to the logon event.
Services are ideal for use alongside a watchdog mechanism. Such a mechanism will ensure a given application is always running, and in case it shuts down abnormally, it will restart. We managed to develop that as well, based on the methods described earlier, which made it possible to always run, be alerted when users log in, and interact with user’s space.
A Timer event was used to monitor the operation of the watched process.
A good logging mechanism is always useful during development, using either a simple, or, when needed, a complex logging tool.
Testing the final solution is imperative. Once the code is checked and verified to work, up to 2% of testers might still report bugs, which is within reason.

InfoQ Software Architects' Newsletter

The Service and the Beast: Building a Windows Service that Does Not Fail to Restart

Write for InfoQ

Key Takeaways

Thank You for Your Service …

Related Sponsors

Stay! Good Service …

The Basics

Detecting User Log On

Subscribing to the Logon event

It’s Not You, It’s Me: Impersonating a User

Finding the Logged-In User

A Watchdog Is a Service's Best Friend

Some Building Blocks

GetExePath

WriteLogFile

More Building Blocks - Registry Stuff

Checking If Our Host Is Running

How to Test the Service

The SampleApp

Running from CMD

Uninstalling

Conclusion:

About the Authors

Michael Haephrati

Ruth Haephrati

Rate this Article

This content is in the Reliability topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter