Table of Contents
November 2023 newsletter
What we've been up to
Applications and demos
- Naveen Michaud-Agrawal has been playing with the new shader system that we introduced last month to draw various complex things that were impossible (or at least very slow) before. 3D bunny that Naveen rotates by rotating his cardboard dial:
- (Omar: “it would be cool if instead of a planar dial you had some kind of spacemouse or dodecapen-like 3d proxy object/cube that you could orient arbitrarily to control the 3d model, that would have fiducials on all sides [maybe multiple cameras pointed at it too]”)
- Cristóbal Sciutto has been working on a new region implementation in C (mostly for performance reasons) and started on a convex hull demo
- Arcade Wise created an interactive 'page' on an iPad, so you can touch the iPad and your touches propagate to the Folk system:
Tabletop / in-system editor
- Naveen working on a local editor on his system
- Implemented inline logging (where you can call
log
and see the output in context on that line of code)
- Andrés working on multi-keyboard, moving keyboard into virtual program
- (we'll finally be able to dump tcl-thread and pi.tcl and have everything in virtual programs and subprocesses and statement communication!)
- Here's a preview of it monitoring key presses from multiple keyboards plugged into folk0:
Calibration and CNC
Omar has been continuing to work on 3D/real-world/accurate/metric calibration. One early application we plan on is an interface to control our CNC machine through Folk, including accurate (better than 1mm) projected 'print preview' of what lines the machine will cut on the actual material.
(the material might be an inch or two thick, so you need to be able to projection-map 3D surfaces above the base plane of the Folk system, and you need high accuracy – current Folk is only calibrated to a single plane homography and is often 1cm off or worse)
Basically, in order to do 3D calibration, you need to hold up some known pattern in a few different poses and project onto it, and that lets you characterize the behavior of the camera and projector in 3D space. Most people say that a pretty simple pinhole-camera (linear) model with some adjustment for radial distortion is sufficient for better-than-a-pixel accuracy.
Using AprilTags for the projected calibration pattern is nice, because we already link the AprilTag library (so no need for more custom computer vision), and because they're self-verifying and error-correcting and give you a lot of information – if you see an AprilTag, you know it was the exact AprilTag you tried to project, and you know its identity and corners and angle, as opposed to projecting stripes or something where you need your own thresholds and classification to figure out what stripes you're seeing (if any).
- Early November: the first 3D calibration test/demo, using a calibration pattern based on Audet (2009) but using AprilTag fiducials and printing on the outer perimeter instead of alternating even/odd
- It's useful to leave gaps in the printed and projected patterns so it's easy to move the board & get the tags 'aligned enough' that the system can see at least 1 projected + 1 printed tag on the same plane, which is then enough information to properly rewarp all the projected tags to fit into the calibration board.
- Then once you can see most of the projected + printed tags on the board, you record that as a set of example points for calibration (you now have both board-plane coordinates in meters and camera and/or projector coordinates in pixels for all corners of all those tags)
- Notice how tags in the center of the calibration board are projected dynamically, while tags on the outer perimeter are printed:
- Mid-November: We were running into a problem: how do we know when the rewarped projected tags have actually rewarped? there's some latency (tens of milliseconds? hundreds of milliseconds?) from when Folk rewarps the tags to fit the board and when that rewarp actually appears in the webcam feed. If you read the tags too early and do another rewarp based on that, the tags become unreadable and you have to abort calibration. And you obviously don't want to just have a fixed-length sleep (which is what we did at first), you want to dynamically adapt to whatever latency happens to be present.
- So we introduced a concept of versions, where we rotate half the projected tags by 90 degrees every rewarp, so we can tell if the rewarp has gone through by the rotations of the detected tags
- Late November: We started testing the 3D calibration but found that tag outlines were weirdly offset – on folk-convivial, they were a few inches off, and on my home system, they were actually flipped, where the outline would appear on the other side of the table and move in the opposite direction from the physical tag.
- Why was it projecting upside-down or offset? The answer is surprisingly cool and physically intuitive: I was using camera frame coordinates (x, y, z from the center of the camera lens) where I should have been using projector frame coordinates (x, y, z from the center of the projector lens), so I was projecting the outline as though the projector was where the camera is.
- And in folk-convivial, the projector was a few inches from the camera but oriented the same way, while on my home system, the projector was upside-down relative to the camera!
- Had to figure out extrinsic information during camera calibration and projector calibration (what is the rotation and translation of each pose relative to the projector & relative to the camera), then use those extrinsics to derive the rotation and translation of the projector frame relative to the camera frame (“stereo calibration”)
- Late November: Finally got 3D calibration to the point of testing it and seeing the outlines align with the tags:
- The outline is still ~1cm off, but notice how it's the right size no matter what distance from the camera or skew I hold the tag up at. That's a 3D calibration (and internally, coordinates here are all in meters from the center of the camera lens, too)
- Next: try to improve smoothness and accuracy by using nonlinear refinement & adding a radial distortion term to the calibration
I'm excited about real-world calibration because it feels like the first system change in a long time (6 months? a year?) that will actually bring in new modes of interaction, that will look different to the untrained eye from putting pieces of paper on a table and watching them light up. (This is also why the button is great.)
I've spent a lot of the last 6-12 months on performance and on making cleaner and more powerful APIs, but I haven't pushed the interaction paradigm or the application set enough yet. Seeing preview lines on chunks of wood on a CNC machine bed, having a projection-mapped dollhouse or town, highlighting and sharing sections of a book, writing in blank fields on printed programs – these are the directions where we should be pushing.
Other system improvements
-
- The text rendering improvement boils down to a really simple diff, basically just:
- Naveen added text alignment to custom anchors: instead of always centering on the position, you can have the text go to the right, or to the left, etc
- Naveen added non-uniform (different in X and Y) image scaling
-
- but not (yet) when subprocess goes down due to an internal crash (like a segfault) or due to arbitrary external Unix kill (we have this whole supervisor thing already that's meant to be able to waitpid on each subprocess and detect those conditions, we're just not using it)
Proposal: Parallel Folk
Omar has been throwing around ideas for a parallel Folk evaluator. It's an interesting design problem – was talking it over in Discord with Naveen and Arcade and others.
The main motivation is actually sort of a user interface motivation: I want people to feel comfortable writing programs that hang or segfault, and I want people to feel comfortable putting 30 different programs on the table, without being scared that they'll crash the whole Folk system or break their (in-system) editor. That requires Unix process isolation, I think, and it requires preemptive multitasking.
And having the system schedule and parallelize everything itself could reduce latency: right now, even though the system is multiprocess (camera intake, tag detection, and display run on separate Folk subprocesses), the main process is a latency bottleneck for the entire system.
(And I think it'd be more elegant – it's a return to some of the original tuplespace pitch, as a concurrency model – it could free the user from some implementation details of processes and memory allocation and what calls block and what calls are fast – it could let debugging tools have more access to the running database and not interfere with the runtime – it would enforce a very attractive purity that you can't use global variables or functions, everything really must be shared through the database, forcing us to improve the database and language ergonomics accordingly)
Continued thinking about parallel Folk: I feel like there are two sort of platonic designs with different advantages & disadvantages
* One Unix process per Folk program (page or virtual program)
* One Unix process ~per CPU / process pool
The process-per-program design is probably more intuitive based on the interface we present to users – you can have global variables or open file descriptors (or talk to the GPU, or a webcam, or whatever) and they'll work within the same program, it's like 'lexical scope'.
But it doesn't parallelize at all: for example, what if you have a program that does When /p/ is a tag { 5ms of computation on $p }? you'd want to parallelize that across 7 tags, not run that block serially. (maybe the answer here is the system can spawn threads within the process for within-program parallelism)
The process-pool design is easier to schedule and probably more efficient, you would do something like https://en.wikipedia.org/wiki/Work_stealing. But it would be really hard to get stuff like Vulkan or webcams to work without adding some barnacles like pinned blocks (since those rely on process-local resources).
And I don't know how well we could do fault isolation in a process pool since the process boundaries would be sort of arbitrary (it might be OK, like you could just regenerate the pool and rerun things that go down, but it's less obvious than if program = process)
One way to think about it is that there are a few properties we want, and they kind of conflict with each other. among those are:
* if C code segfaults, we want to contain the fault to a reasonable level and not take down the whole system
* if arbitrary Folk code does sleep 5 or exec glslc or inf loops, we want it to not block the rest of the system (including potentially other invocations/matches of the same code that are responding to diff statements)
* we want to be able to open a webcam or Vulkan context and then be able to read/write it later
One thought is that a parallel Folk evaluator would basically be a mix of jimtcl + a C kernel/database/evaluator which knows all the statements and can pull a When body string out and jimtcl-evaluate it on any CPU
(since those strings should be completely self-contained, other than parameters which come in via bound variables on the When)
Proposal: Real-world 3D regions/points
Now that 3D calibration is almost ready, Omar is starting to think about how the geometry representation and display API in Folk will have to change to accommodate points in real-world units and real-world 3D coordinate frames. (and how to future-proof so you can eventually throw more projectors/cameras/other sensors and actuators at a Folk system and integrate all the data cleanly)
It will probably require dumping/breaking a lot of existing hard-coded display and geometry code that thinks in terms of 2D planar pixel coordinates.
Ideas on new point and region formats:
A point would be a list {frame x y z} where frame is a string that (ideally) universally-uniquely identifies a projector/camera/apriltag/other coordinate system, and x y z are in meters. transformation between frames should just be rigid (rotation and translation) since they're all real-world coordinates
An example frame would be folk0:/dev/video0, indicating a point in the coordinate system of the camera /dev/video0 on the node folk0. Then x y z would be in meters from the center of that camera lens, where a point 2 meters away would have z=2, etc. (similar for a projector or an apriltag id being the frame – in those cases, you'd measure from center of projector lens or from center of apriltag, respectively)
A region might be similar to regions now – list of 2D points and list of edges between those points, or maybe some other way of specifying 2D areas – but with the addition of a frame field like the frame field on points, and with all 2D points now being in meters. (Many regions would probably have frame being an AprilTag)
(we might not actually need points and could do everything with regions)
one goal is to make it possible to have ensembles of cameras and projectors and seamlessly integrate them
as well as depth cameras, RFID, phone/tablet localization, etc
Outreach and new systems
- Built a new folk-beads system in the front of Hex House, for event with Amant
- (using 1080p Optoma EH320UST ultra-short-throw we found on eBay, which is fun because it's relatively easy to mount and very bright)
- Would be cool to put this on a cart and be able to wheel it around and bring it in and out
- Fun interim on-a-wall stage during setup, before we pointed it down at the table:
- I think this image is a fun provocation because it's sort of about the wall, the drawing on the whiteboard, instead of being about the system. Imagine wheeling a system over in 10 seconds and pointing it at part of your environment because you want a thin layer of computation or sharing or dynamism on top of what's there (and not because you want a generic “Folk system” set up that could be set up anywhere)
- (btw, whenever I see a projection mapping demo, I always feel like we should be able to subsume it and make it really easy to set up and reprogram, like it was trivial to make the meow text here wiggle around over time)
- Simplified install instructions to just use Vulkan binaries from apt, instead of compiling mesa from scratch
- Another successful open house on the 29th:
- It's nice that we can put up and tear down folk-beads in like 15 minutes total – the PC doesn't need to be reformatted and we know the exact physical setup to do (including getting the projector at the right height and clamping it to be stable) and recalibration is fast
- Got a Logitech C922 camera for folk-beads: at least it can do 60fps at 720p, much smoother-feeling than the C920 or Azure Kinect which can only do 30fps at any resolution
Hmm
(I think part of our job has to be to just broaden the examples so that fitting to them still gives you a pretty productive working set)
What we'll be up to in December
- The next Folk open house will be on the evening of Wednesday, December 13, at our studio in East Williamsburg, Brooklyn.
- Still need to complete RFID refactor. I need to make the ringbuffer faster and I need to wait on less data to start parsing ..
- Finishing work on new calibration – trying to bring up accuracy and stability now, then introduce new 3D/frame-oriented coordinates
- Might work on Vulkan user-facing shader language
- Hacking on parallel Folk?
- Finishing up tabletop editor