2015-06-28

Midterm evaluation

The midterm-evaluation milestone is as follows:
Have JyNI detect and break reference-cycles in native objects backed by Java-GC. This must be done by Java-GC in order to deal with interfering non-native PyObjects. Further this functionality must be monitorable, so that it can transparently be observed and confirmed.

Sketch of some issues

The issues to overcome for this milestone were manifold:
  • The ordinary reference-counting for scenarios that actually should work without GC contained a lot of bugs in JyNI C-code. This had to be fixed. When I wrote this code initially, the GC-concept was still an early draft and in many scenarios it was unclear whether and how reference-counting should be applied. Now all this needed to be fixed (and there are probably still remaining issues of this type)
  • JNI defines a clear policy how to deal with provided jobject-pointers. Some of them must be freed explicitly. On the other hand some might be freed implicitly by the JVM - without your intention, if you don't get it right. Also on this front vast clean-up in JyNI-code was needed, also to avoid immortal trash.
  • JyNI used to keep alive Java-side-PyObjects that were needed by native objects indefinitely.
    Now these must be kept alive by the Java-copy of the native reference-graph instead. It was hard to make this mechanism sufficiently robust. Several bugs caused reference-loss and had to be found to make the entire construct work. On the other hand some bugs also caused hard references to persist, which kept Java-GC from collecting the right objects and triggering JyNI's GC-mechanism.
  • Issues with converting self-containing PyObjects between native side and Java-side had to be solved. These were actually bugs unrelated to GC, but still had to be solved to achieve the milestone.
  • A mechanism to monitor native references from Java-side, especially their malloc/free actions had to be established.
    Macros to report these actions to Java/JyNI were inserted into JyNI's native code directly before the actual calls to malloc or free. What made this edgy is the fact that some objects are not freed by native code (which was vastly inherited from CPython 2.7), but cached for future use (e.g. one-letter strings, small numbers, short tuples, short lists). Acquiring/returning an object from/to such a cache is now also reported as malloc/free, but specially flagged. For all these actions JyNI records timestamps and maintains a native object-log where one can transparently see the lifetime-cycle of each native object.
  • The original plan to explore native object's connectivity in the GC_Track-method is not feasible because for tuples and lists this method is usually called before the object is populated.
    JyNI will have a mechanism to make it robust of invalid exploration-attempts, but this mechanism should not be used for normal basic operation (e.g. tuple-allocation happens for every method-call) but only for edgy cases, e.g. if an extension defines its own types, registers instances of them in JyNI-GC and then does odd stuff with them.
    So now GC_track saves objects in a todo-list regarding exploration and actual exploration is performed at some critical JyNI-operations like on object sync-on-init or just before releasing the GIL. It is likely that this strategy will have to be fine-tuned later.

Proof of the milestone

To prove the achievement of the explained milestone I wrote a script that creates a reference-cycle of a tuple and a list such that naive reference-counting would not be sufficient to break it. CPython would have to make use of its garbage collector to free the corresponding references.
  1. I pass the self-containing tuple/list to a native method-call to let JyNI create native counterparts of the objects.
  2. I demonstrate that JyNI's reference monitor can display the corresponding native objects ("leaks" in some sense).
  3. The script runs Java-GC and confirms that it collects the Jython-side objects (using a weak reference).
  4. JyNI's GC-mechanism reports native references to clear. It found them, because the corresponding JyNI GC-heads were collected by Java-GC.
  5. Using JyNI's reference monitor again, I confirm that all native objects were freed. Also those in the cycle.

The GC demonstration-script


import time
from JyNI import JyNI
from JyNI import JyReferenceMonitor as monitor
from JyNI.gc import JyWeakReferenceGC
from java.lang import System
from java.lang.ref import WeakReference
import DemoExtension

#Note:
# For now we attempt to verify JyNI's GC-functionality independently from
# Jython concepts like Jython weak references or Jython GC-module.
# So we use java.lang.ref.WeakReference and java.lang.System.gc
#
to monitor and control Java-gc.

JyNI.JyRefMonitor_setMemDebugFlags(1)
JyWeakReferenceGC.monitorNativeCollection = True

l = (123, [0, "test"])
l[1][0] = l
#We create weak reference to l to monitor collection by Java-GC:
wkl = WeakReference(l)
print "weak(l): "+str(wkl.get())

# We pass down l to some native method. We don't care for the method itself,
# but conversion to native side causes creation of native PyObjects that
# correspond to l and its elements. We will then track the life-cycle of these.
print "make l native..."
DemoExtension.argCountToString(l)

print "Delete l... (but GC not yet ran)"
del l
print "weak(l) after del: "+str(wkl.get())
print ""
# monitor.list-methods display the following format:
# [native pointer]{'' | '_GC_J' | '_J'} ([type]) #[native ref-count]: [repr] *[creation time]
# _GC_J means that JyNI tracks the object
# _J means that a JyNI-GC-head exists, but the object is not actually treated by GC
# This can serve monitoring purposes or soft-keep-alive (c.f. java.lang.ref.SoftReference)
# for caching.
print "Leaks before GC:"
monitor.listLeaks()
print ""

# By inserting this line you can confirm that native
# leaks would persist if JyNI-GC is not working:
#JyWeakReferenceGC.nativecollectionEnabled = False

print "calling Java-GC..."
System.gc()
time.sleep(2)
print "weak(l) after GC: "+str(wkl.get())
print ""
monitor.listWouldDeleteNative()
print ""
print "leaks after GC:"
monitor.listLeaks()

print ""
print "===="
print "exit"
print "===="


It is contained in JyNI in the file JyNI-Demo/src/JyNIRefMonitor.py

Instructions to reproduce this evaluation

  1. You can get the JyNI-sources by calling
    git clone https://github.com/Stewori/JyNI
    Switch to JyNI-folder:
    cd JyNI
  2. (On Linux with gcc) edit the makefile (for OSX with llvm/clang makefile.osx) to contain the right paths for JAVA_HOME etc. You can place a symlink to jython.jar (2.7.0 or newer!) in the JyNI-folder or adjust the Jython-path in makefile.
  3. Run make (Linux with gcc)
    (for OSX with clang use make -f makefile.osx)
  4. To build the DemoExtension enter its folder:
    cd DemoExtension
    and run setup.py:
    python setup.py build
    cd ..
  5. Confirm that JyNI works:
    ./JyNI_unittest.sh
  6. ./JyNI_GCDemo.sh

Discussion of the output


Running JyNI_GCDemo.sh:

JyNI: memDebug enabled!
weak(l): (123, [(123, [...]), 'test'])
make l native...
Delete l... (but GC not yet ran)
weak(l) after del: (123, [(123, [...]), 'test'])

Leaks before GC:
Current native leaks:
139971370108712_GC_J (list) #2: "[(123, [...]), 'test']" *28
139971370123336_J (str) #2: "test" *28
139971370119272_GC_J (tuple) #1: "((123, [(123, [...]), 'test']),)" *28
139971370108616_GC_J (tuple) #3: "(123, [(123, [...]), 'test'])" *28

calling Java-GC...
weak(l) after GC: None

Native delete-attempts:
139971370108712_GC_J (list) #0: -jfreed- *28
139971370123336_J (str) #0: -jfreed- *28
139971370119272_GC_J (tuple) #0: -jfreed- *28
139971370108616_GC_J (tuple) #0: -jfreed- *28

leaks after GC:
no leaks recorded

====
exit
====
Let's briefly discuss this output. We created a self-containing tuple called l. To allow it to self-contain we must put a list in between. Using a Java-WeakReference, we confirm that Java-GC collects our tuple. Before that we let JyNI's reference monitor print a list of native objects that are currently allocated. We refer to them as "leaks", because all native calls are over and there is no obvious need for natively allocated objects now. #x names their current native ref-count. It explains as follows (observe that it contains a cycle):
139971370108712_GC_J (list) #2: "[(123, [...]), 'test']"

This is l[1]. One reference is from JyNI to keep it alive, the second one is from l.

139971370123336_J (str) #2: "test"

This is l[1][1]. One reference is from JyNI to keep it alive, the second one is from l[1].


139971370119272_GC_J (tuple) #1: "((123, [(123, [...]), 'test']),)"
This is the argument-tuple that was used to pass l to the native method. The reference is from JyNI to keep it alive.
139971370108616_GC_J (tuple) #3: "(123, [(123, [...]), 'test'])" 
This is l. One reference is from JyNI to keep it alive, the second one is from the argument-tuple (139971370108616)and the third one is from l[1]. Thus it builds a reference-cycle with l[1].

After running Java-GC (and giving it some time to finnish) we confirm that our weak reference to l was cleared. And indeed, JyNI's GC-mechanism reported some references to clear, all reported leaks among them. Finally another call to JyNI's reference monitor does not list leaks any more.

Check that this behavior is not self-evident

In JyNI-Demo/src/JyNIRefMonitor.py go to the section:

# By inserting this line you can confirm that native
# leaks would persist if JyNI-GC is not working:
#JyWeakReferenceGC.nativecollectionEnabled = False

Change it to
# By inserting this line you can confirm that native
# leaks would persist if JyNI-GC is not working:

JyWeakReferenceGC.nativecollectionEnabled = False

Run JyNI_GCDemo.sh again. You will notice that the native leaks persist.

Next steps

The mechanism currently does not cover all native types. While many should already work I expect that some bugfixing and clean-up will be required to make this actually work. With the demonstrated reference-monitor-mechanism the necessary tools to make this debugging straight forward are now available.

After fixing the remaining types and providing some tests for this, I will implement an improvement to the GC-mechanism that makes it robust against silent modification of native PyObjects (e.g. via macros). And provide tests for this.

Finally I will add support for the PyWeakReference builtin type. As far as time allows after that I'll try to get ctypes working.

2015-06-19

GSoC status-update for 2015-06-19

I finally completed the core-GC routine that explores the native PyObject reference-connectivity graph and reproduces it on Java-side. Why mirror it on Java side? Let me comprehend the reasoning here. Java performs a mark-and-sweep GC on its Java-objects, but there is no way to extend this to native objects. On the other hand using CPython's reference-counting approach for native objects is not always feasible, because there are cases where a native object must keep its Java-counterpart alive (JNI provides a mechanism for this), allowing it to participate in an untracable reference cycle. So we go the other way round here, and let Java-GC track a reproduction of the native reference connectivity-graph. Whenever we observe that it deletes a node, we can discard the underlying native object. Keeping the graph up to date is still a tricky task, which we will deal with in the second half of GSoC.

The native reference-graph is explored using CPython-style traverseproc mechanism, which is also implemented by extensions that expect to use GC at all. To mirror the graph on Java-side I am distinguishing 8 regular cases displayed in the following sketch. These cases deal with representing the connection between native and managed side of th JVM.

In the sketch you can see that native objects have a so-called Java-GC-head assigned that keeps alive the native object (non-dashed arrow), but is only weakly reachable from it (dashed arrow). The two left-most cases deal with objects that only exist natively. The non-GC-case usually needs no Java-GC-head as it cannot cause reference cycles. Only in GIL-free mode we would still track it as a replacement for reference-counting. However GIL-free mode is currently a vague consideration and out of scope for this GSoC-project. Case 3 and 4 from left deal with objects where Jython has no corresponding type and JyNI uses a generic PyCPeer-object - a PyObject-subclass forwarding the magic methods to native side. PyCPeer in both variants serves also as a Java-GC-head. CStub-cases refer to situations where the native object needs a Java-object as backend. In these cases the Java-GC-head must not only keep alive other GC-heads, but also the Java-backend. Finally in mirror-mode both native and managed representations can be discarded independently from each other at any time, but for performance reasons we try to softly keep alive the counterparts for a while. On Java-side we can use a soft reference for this.

PyList is a special case in several ways. It is a mutable type that can be modified by C-macros at any time. Usually we move the java.util.List backend to native side. For this it is replacing by JyList - a List-implementation that is backed by native memory, thus allowing C-macros to work on it. The following sketch illustrates how we deal with this case.


It works roughly equivalent to mirror mode but with the difference that the Jython-PyList must keep alive its Java-GC-head. For a most compact solution we build the GC-head functionality into JyList.

Today I finished the implementation of the regular cases, but testing and debugging still needs to be done. I can hopefully round this up for midterm evaluation and also include the PyList-case.

2015-06-05

Establish a proper referencing-paradigm (as preparation for GC-implementation)

In the last days I tidied up a lot reference-related stuff and clarified what reference-types the type-conversion methods deliver. Before starting with the actual GC-work, much clean-up regarding ordinary reference-counting needed to be done, especially related to singletons and pre-allocated small integers and length-one strings/unicode-objects (characters). Both CPython and Jython do pre-allocate such frequently used small objects and I felt these caches should be linked to improve conversion-performance. Now the integer-conversion methods check whether the integers are small pre-allocated objects and if so establishes a permanent linkage between the CPython- and Jython-occurrence of the converted object.

More important: I established some rules regarding referencing and worked out their JyNI-wide appliance.

- JyObjects (i.e. JyNI-specific pre-headers of native PyObjects) store a link to a corresponding jobject (as of the very beginning of JyNI-development). It is now specified that this is always of JNI-type WeakGlobalReference. Accordingly I re-declared the JyObject-struct to use jweak-type instead of jobject.

- The conversion-method
jobject JyNI_JythonPyObject_FromPyObject(PyObject* op)
now is specified to return a JNI-LocalReference. If this is obtained from the jweak stored in a JyObject, it is checked to be still alive. If so, a LocalReference is created to keep it alive until the caller is done with it. If the caller wants to store it, he must create a GlobalReference from it, see
http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#weak_global_references
for more details on JNI-references. The macros JyNIClearRef and JyNIToGlobalRef are now provided as efficient helpers in this context.

- The conversion-method
PyObject* JyNI_PyObject_FromJythonPyObject(jobject jythonPyObject)
now consistently always returns a *new* reference, i.e. the caller must decref it after using it. (GIL-free mode will be another story, but will most likely just ignore ref-counting, so the decref command won't harm this mode.)

These rules clarify the usage of the conversion-functions. Fixing all calls to these methods JyNI-wide to comply with the new semantics (having to use decref or JyNIClearRef etc) is still in progress. However having this referencing-paradigm specified and cleaned up is a crucial initial step to implement a proper garabge collction in JyNI.