Omar: lots of progress on new evaluator. I spent much of this month trying different locking and reference counting and memory reclamation schemes.
A lot of it was frustrating and had me stuck, but in the last week or two I think I've figured out something that works, where on each statement and match you have an atomic reference count (number of active pointers to that statement/match) + a bunch of either immutable or individually atomic fields: stuff like clause terms, parent count, whether to free at next opportunity, list of children. Freeing each statement or match is deferred until a pointer release that makes the number of active pointers hit 0 + parent count is 0 (if statement) or free was requested (if match). Note that the pointer count is a reference count, but it's not the full reference count – you have to combine its 'program liveness' information with 'database liveness' information (was match free requested? / did statement parent count hit 0?).
And now that we have reasonable refcount/locking, we've been able to do the long-awaited table test, where we actually run the new parallel evaluator on a physical Folk table with webcam and camera and physical programs. (We expected it to be slow – we need the physical table test to actually know what the performance is like and how we need to go about optimizing things.) It works:
So, still a lot of work to do on performance, top-level flame graph not very informative:
Another trick that reduces the locking/refcounting complexity (and which I mentioned briefly last month): I've gotten rid of all the backward (child→parent) pointers in the system. That reduces the number of fields on each statement/match and the amount of locking needed, since you don't need to walk back and remove the child from each parent when the child is removed. Matches' backward pointer list is removed entirely, because match removal only happens when a parent is removed (you never need to walk back from a match to see what its parents are); statements' backward pointer list is replaced with the flat parent count which is just an integer (needed so you know to remove the statement when parent count hits 0) and is easy to make atomic.
(Without backward pointers to go back to the parent, how do we get rid of parent→child forward pointers when the child they're pointing at is destroyed? The answer, hopefully, is that we can just check the pointers [they're actually weak refs that contain a slot generation, not raw pointers] and throw away / ignore any that point at dead matches/statements. We can even amortize that because we already do child pointer list resizing anyway, which is already a linear scan – just additionally do the weak ref check on resize and throw away dangling refs then.)
But that's not all implemented yet – still need to clean up some memory leaks (I don't think we're collecting clauses yet?) and dangling pointers:
Nice side benefit of the new evaluator: it boots much faster, making it faster to iterate and test things, because it's only compiling virtual-program-level C code (Gpu, AprilTags); the kernel is statically compiled into a monolithic folk binary. (this feels like a reasonable tradeoff, since no one was messing with the kernel in practice anyway)
Some remaining stuff to do:
There are still some high-level challenges from parallelizing the workqueue, stuff that we 'got for free' from having a single thread and converging to a fixed point. like, what order do you do operations in, how do you avoid wasted work, what does 'order' mean if things can happen in parallel.. I think we can hack around a lot of this by messing with priorities and specifying manual 'wait a few milliseconds' type stuff for specific Whens, or maybe having some notion of local convergence or subconvergence.
Omar: I've been working on massively simplifying the Folk installation process (reducing the number of steps, reporting the system state better to the user so you don't just boot to black screen or terminal, making a proper calibration/setup process and props).
The main thing is making a live USB that just boots into Folk and walks through calibration & network setup and can install to disk. (the near-term application is to distribute a version of the CNC stuff as a demo that is immediately useful)
I've been using Debian live-build, which seems to be the de facto standard for making live USBs: documentation/resources are pretty good, you can preinstall any packages from Debian, etc. Good resources are https://terkeyberger.wordpress.com/2022/03/07/live-build-how-to-build-an-installable-debian-10-buster-live-cd/ and https://dquinton.github.io/debian-install/netinstall/live-build.html (it's an interesting game to figure out the minimal set of Debian packages to preinstall so you get a 1. working Folk on boot and 2. working internet connection, and after that you don't need to rebuild until you publish, hopefully, since you can hot patch from online on boot)
(I've been building the live .iso in an Intel Linux UTM/qemu VM on my Mac laptop, which is a little annoying. I think the live build process is already slow, it's not incremental at all, so it's installing all the stuff for this 1.4GB Debian setup from scratch on each build, and it's even slower because it's in emulation. I also burned some time trying to make the live USB itself work in a VM to test on my laptop – the graphics doesn't work, since there's no Vulkan driver for the VM. could use SwiftShader or llvmpipe, but they don't have VK_KHR_display, so now you need a new codepath in Folk and maybe you need libdrm again to put your software-rendered buffer onto the VM's screen…)