Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcoredbg process crashes when pausing after the last debugged thread ended #151

Open
qgindi opened this issue Dec 8, 2023 · 6 comments

Comments

@qgindi
Copy link

qgindi commented Dec 8, 2023

Code to reproduce:

using System;
using System.Threading;
using System.Diagnostics;

var t = new Thread(() => {
	Debugger.Break(); //or any statement with a breakpoint
});
t.Start();

for (int i = 0; i < 100; i++) {
	Thread.Sleep(1000);
	//Console.WriteLine(i);
	if (!t.IsAlive) {
		Thread.Sleep(500);
		Debugger.Break(); //or any statement with a breakpoint
	}
}

Steps to reproduce:

  1. Run this program with the debugger, in launch or attach mode.
  2. When stops at the first Debugger.Break, execute 'finish' (step out) or two 'next'. The thread ends. The main thread continues and runs until reaches the second Debugger.Break.

Result:
Sometimes netcoredbg process crashes at the second Debugger.Break.
Not every time. About 1/5 times.

Tested and can be reproduced on Windows 11 with CLI and MI.

If can't reproduce, I can attach .dmp.

@viewizard
Copy link
Member

Looks like some Windows related only issue (some MS compiler optimization?), I was able to reproduce some time "silent" exit after "finish" command and thread exited, id: 5964 line output on netcoredbg release build (Windows 10, .NET SDK 8.0), but was not able reproduce this issue with netcoredbg debug build... In the same time, I was not able to reproduce this issues on Linux with same sample dll and .NET SDK 8.0 on both (release and debug netcoredbg builds).

@qgindi
Copy link
Author

qgindi commented Dec 8, 2023

The exception is in mscordbi.dll!CordbStepper::Deactivate() Line 380. It is called directly from netcoredbg.exe.
The reason is an invalid pointer (access violation).
The mscordbi source code shows that the pointer is stored as a field in an CorDebugThread instance.
Probably the CorDebugThread object is released and its memory is already overwritten.
The CorDebugThread* is a field of CordbStepper. It is used by CordbStepper::Deactivate.
Maybe netcoredbg releases the ICorDebugThread* too early, before calling ICordbStepper::Deactivate.

@viewizard
Copy link
Member

The point is, we deactivate all steppers that active in app domain and this list provided by runtime debug API itself on each stop (break, breakpoint) this is common logic:

ToRelease<ICorDebugAppDomainEnum> domains;
IfFailRet(pProcess->EnumerateAppDomains(&domains));
ICorDebugAppDomain *curDomain;
ULONG domainsFetched;
while (SUCCEEDED(domains->Next(1, &curDomain, &domainsFetched)) && domainsFetched == 1)
{
ToRelease<ICorDebugAppDomain> pDomain(curDomain);
ToRelease<ICorDebugStepperEnum> steppers;
IfFailRet(pDomain->EnumerateSteppers(&steppers));
ICorDebugStepper *curStepper;
ULONG steppersFetched;
while (SUCCEEDED(steppers->Next(1, &curStepper, &steppersFetched)) && steppersFetched == 1)
{
ToRelease<ICorDebugStepper> pStepper(curStepper);
pStepper->Deactivate();
}
}

runtime debug API (mscordbi.dll part of it, that executed on debugger's side) care about lists of active threads and steppers.

Maybe netcoredbg releases the ICorDebugThread* too early, before calling CordbStepper::Deactivate.

netcoredbg don't hold and don't release managed threads, care about threads and steppers COM object is part of runtime debug API. netcoredbg could only request thread COM objects references (that will increase internal ref counter by 1) at execution stop (break, breakpoint, step...) for interuction and "release" this ref (that only mean decrease ref counter for this COM object). But all real memory related work (allocation and release) is part of debugger API (in case of debugger's side - DAC/DBI).

For example, from code I posted above, at ICorDebugStepperEnum::Next() call, debug API provide us we ref of ICorDebugStepper that already have invreased (for our ref) counter inside. This stepper can't be released (since we hold 1 ref). As you could see, curend debug API don't requeire hold thread COM object here at all (ICorDebugProcess -> ICorDebugAppDomainEnum -> ICorDebugAppDomain -> ICorDebugStepperEnum -> ICorDebugStepper).

@qgindi
Copy link
Author

qgindi commented Dec 9, 2023

The VS debugger clearly shows that the CorDebugThread memory is damaged at the time of calling ICordbStepper::Deactivate. For example the invalid pointer value stored in it as m_pAppDomain is 0x50. I only guess the the reason.

@viewizard
Copy link
Member

This issue looks like some race condition in runtime code (https://github.com/dotnet/runtime). For example, debuggee process already have thread dead, but mscordbi don't have this info and provide to debugger data, mscordbi get this updated thread status from debuggee process and change internal states, but debuger call in mscordbi something that by mscordbi internal logic can't be called any more (and mscordbi don't count on this). Usually this mean 1) mscordbi don't prevent data changes from one side when another side do something with this data; or 2) mscordbi don't check changes in proper place.

I really doubt that this is m_pAppDomain related, since this object never changes during managed process execution till process exit. Find and fix "race condition" like issues not so easy, you should find differences in execution sequence in mscordbi/runtime with and without this issue, analize current mscordbi code and find solution that will not broke current logic but prevent race condition... Knowledge that some pointer is wrong give nothing, in order to fix this issue you should understand why it wrong in this time and ok in another.

@qgindi
Copy link
Author

qgindi commented Dec 20, 2023

A workaround. Possible memory leak, but now never crashes.

HRESULT STDMETHODCALLTYPE ManagedCallback::ExitThread(ICorDebugAppDomain *pAppDomain, ICorDebugThread *pThread)
{
    LogFuncEntry();

    ThreadId threadId(getThreadId(pThread));
    m_debugger.m_sharedThreads->Remove(threadId);

    m_debugger.m_sharedEvalWaiter->NotifyEvalComplete(pThread, nullptr);
    if (m_debugger.GetLastStoppedThreadId() == threadId) {
        m_debugger.InvalidateLastStoppedThreadId();

        //Workaround for netcoredbg process crash later when calling pStepper->Deactivate() from SimpleStepper::DisableAllSteppers.
        //https://github.com/Samsung/netcoredbg/issues/151
        //Never mind: possible memory leak.
        pThread->AddRef();
    }

    m_debugger.m_sharedBreakpoints->ManagedCallbackExitThread(pThread);

    m_debugger.pProtocol->EmitThreadEvent(ThreadEvent(ManagedThreadExited, threadId, m_debugger.m_interopDebugging));
    return m_sharedCallbacksQueue->ContinueAppDomain(pAppDomain);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants