Skip to content

Conversation

@Scusemua
Copy link

@Scusemua Scusemua commented Jul 20, 2024

I found what appears to be a bug in bind/symbols.go.

Consider the following Go function that was generated by gopy:

func gopackage_GoObject_ExampleFunc(_handle CGoHandle, val CGoHandle, pythonCallback *C.PyObject, callbackArgument *C.char, goRun C.char) {
	_fun_arg := pythonCallback
	_saved_thread := C.PyEval_SaveThread() // Release GIL
	defer C.PyEval_RestoreThread(_saved_thread) // Acquire GIL (deferred) 
	vifc, __err := gopyh.VarFromHandleTry((gopyh.CGoHandle)(_handle), "*gopackage.GoObject")
	if __err != nil {
		return
	}
	if boolPyToGo(goRun) {
		go gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			if C.PyCallable_Check(_fun_arg) == 0 {
				return
			}
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	} else {
		gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			if C.PyCallable_Check(_fun_arg) == 0 {
				return
			}
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	}
}

This is generated for a Go method ExampleFunction of a GoObject struct. The intent is for this method to be called from Python.

The bug is that we access the C/Python API function C.PyCallable_Check before acquiring the GIL, leading to a SEGFAULT. The problem lies in the following excerpt of the above function:

			if C.PyCallable_Check(_fun_arg) == 0 {
				return
			}
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			_fcargs := C.PyTuple_New(2)

Notice that we only acquire the GIL via PyGILState_Ensure after the call to C.PyCallable_Check(_fun_arg), which leads to a SEGFAULT.

The changes in this PR will result in the above Go function being generated differently as:

func gopackage_GoObject_ExampleFunc(_handle CGoHandle, val CGoHandle, pythonCallback *C.PyObject, callbackArgument *C.char, goRun C.char) {
	_fun_arg := pythonCallback
	_saved_thread := C.PyEval_SaveThread()  // Release GIL
	defer C.PyEval_RestoreThread(_saved_thread) // Acquire GIL (deferred) 
	vifc, __err := gopyh.VarFromHandleTry((gopyh.CGoHandle)(_handle), "*gopackage.GoObject")
	if __err != nil {
		return
	}
	if boolPyToGo(goRun) {
		go gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			if C.PyCallable_Check(_fun_arg) == 0 {
				C.PyGILState_Release(_gstate) // Release GIL
				return
			}
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	} else {
		gopyh.Embed(vifc, reflect.TypeOf(gopackage.GoObject{})).(*gopackage.GoObject).ExampleFunc(*ptrFromHandle_gopackage_Bytes(val), func(arg_0 interface{}, arg_1 string) {
			_gstate := C.PyGILState_Ensure() // Acquire GIL
			if C.PyCallable_Check(_fun_arg) == 0 {
				C.PyGILState_Release(_gstate) // Release GIL
				return
			}
			_fcargs := C.PyTuple_New(2)
			C.PyTuple_SetItem(_fcargs, 0, C.gopy_build_string(C.CString(fmt.Sprintf("%s", (arg_0)))))
			C.PyTuple_SetItem(_fcargs, 1, C.gopy_build_string(C.CString(arg_1)))
			C.PyObject_CallObject(_fun_arg, _fcargs)
			C.gopy_decref(_fcargs)
			C.gopy_err_handle()
			C.PyGILState_Release(_gstate) // Release GIL
		}, C.GoString(callbackArgument))
	}
}

Notice that we now call C.PyGILState_Ensure() before the call to C.PyCallable_Check(_fun_arg). Likewise, we call C.PyGILState_Release(_gstate) before returning within the body of the associated if-statement to ensure that there is a matching call (to match the _gstate := C.PyGILState_Ensure()).

In terms of testing, I've just been using the modified gopy in an application I'm developing. I'm in the process of doing more testing in the meantime; however, I am confident that this is a bug, as you must hold the Python GIL when interfacing/invoking any of the C/Python API.

From the "Thread State and the Global Interpreter Lock" subsection of the "Initialization, Finalization, and Threads" Python C-API documentation:

"Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions."

On a separate (but related) note -- since we defer C.PyEval_RestoreThread(_saved_thread) at the beginning of the method's execution, it's a little silly to release the GIL via C.PyGILState_Release(_gstate) immediately before returning (even in the case where we don't call into Go code); however, we need to have matching calls between C.PyEval_SaveThread() and C.PyEval_RestoreThread() as well as PyGILState_Ensure() and PyGILState_Release(), so I think it ultimately makes sense to do things this way for now. In theory, we could provide more complex logic to handle the GIL more efficiently in the future.

@Scusemua Scusemua changed the title Addressing Incorrect Handling of GIL Addressing Incorrect Handling of GIL that Leads to a SEGFAULT Jul 20, 2024
@Scusemua Scusemua changed the title Addressing Incorrect Handling of GIL that Leads to a SEGFAULT Addressing Incorrect Handling of Python GIL that Leads to a SEGFAULT Jul 20, 2024
@nickeygit
Copy link

Hi @Scusemua !

Thanks for your patch! You saved me several hours of life! :))
And now my turn.

PyGILState locking is needed for sure, but Go is insidious too :)
Goroutines can run on different OS threads
and switch them anytime...

even between PyGILState_Ensure/PyGILState_Release calls!

And sure, PyGILState_Release will panic that current thread has no python thread info and SEGFAULT etc.

My simple two clients-loaded Web server was crashing on this even with your patch.

To fix - we need to lock current goroutine to OS thread during the PyGILState_Ensure/PyGILState_Release section.

Simply add runtime.LockOSThread() call before PyGILState_Ensure (1 place in bind/symbols.go)

and

runtime.UnlockOSThread() after every PyGILState_Release call (3 places in bind/symbols.go)

And now my test server is running fine, calling Python callback in the web handler :)

Feel free to add this to your branch and be brave to propose your PR for merge!

Have nice days guys!

@Scusemua
Copy link
Author

Scusemua commented Dec 9, 2024

Hi @nickeygit,

Yeah, I also found that I needed to pin the goroutines to OS threads, or else I would get really strange errors -- just like you're describing. I just think I forgot to add an update about this to my original post. 😅

With these two fixes, everything has been working fine on my end with no more issues for months! :)

@Scusemua
Copy link
Author

Scusemua commented Mar 6, 2025

I've encountered a new issue in which the application encounters a segmentation fault when calling the resolve callback function.

Sometimes, the segmentation fault occurs on this line:

if C.PyCallable_Check(_fun_arg) == 0 {

Other times, it doesn't occur until C.PyObject_CallObject(_fun_arg, _fcargs).

The issue is with the _fun_arg variable. The _fcargs is fine. I've inspected the callstack in GDB and whatnot, and when the segmentation fault occurs, it's because the _fun_arg is pointing to garbage, rather than the intended PyObject*.

At first, I thought it was perhaps because the callback was being executed before the original thread state was restored by

defer C.PyEval_RestoreThread(_saved_thread)

But I tried adding a sync.WaitGroup such that the callback cannot run until the C.PyEval_RestoreThread(_saved_thread) statement is restored, and the segmentation fault still occurs, with the _fun_arg pointing to garbage...

Has anybody else encountered this? Have you ever experienced this issue, @nickeygit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants