I’ve been toying around more with machine learning — this time is not just about using someone else’s pre-trained model, but training models for a purpose that I already have in mind. The push to do this became stronger — and more feasible — because I was semi-forced to replace my aging MacBook Pro vintage 2013 to a more recent model. Long story short, I’ve bought a 2018 mid-range model with an AMD 560X GPU (Graphics Processing Unit) and custom-order it to max-out its memory configuration. Yes, finally I got a real Touch Bar – but let’s save the story of the highs and lows of those touch-screen function keys for another post.
One major worry of doing machine learning on a laptop is that the machine would get really hot for a prolonged amount of time. Training a machine learning model means that the machine would need to do complex computations for few hours and sometimes overnight or more. This is somewhat of an abuse as far as laptops are concerned — being compact computing devices and not much room to dissipate heat.
Because of this I’ve been looking around to see whether it was worth while to buy a laptop cooler with built-in fans. My big objection is the bulky-ness of these gadgets – typically being larger than the laptops that they are supposed to cool. Being bulky means that it won’t travel with me, hence I won’t be able to do these heavy computations when I’m not at my home-office. Curious I’ve done a few web searches to confirm or deny this.
It turns out, yes, these external fans helps lower down the temperatures a bit. However there was no noticeable increase in performance — notably gaming performance, in which many of these laptop coolers are targeted towards. Moreover the general opinions are to just rely on the laptop’s own cooling system and ensure that it is working in optimal conditions. Should unit was made by a respectable company, their engineers should have tested these thermal factors beforehand and design the cooling system appropriately to match the laptop’s heat dissipation requirements. Hence it boils down to merely ensuring that the built-in fans are able to exchange air properly.
I ended up deciding on a laptop stand instead of a full-blown external cooling system. This allows the laptop to stand on an angle instead of lying flat on the table. These gadgets are mainly meant for ergonomics reasons to bring the display closer to eye level. Since MacBook Pro’s exhausts are located at the display hinge, using a laptop riser would provide more room for heated air to move away from the unit as the built-in fan blows it out. True enough, with a laptop riser the MacBook Pro’s internals managed to stay below 60° Celcius even when the GPU is running full-load overnight. Of course that was measured in a room with the air conditioner set at a comfortable level of 27° Celsius for sleeping (yes, my MacBook Pro was hard at work while its master is sleeping nearby).
As for the actual product, I used Ringke laptop stand — a velcro-based “legs” glued to the laptop that can rise it to varying degrees. When fully folded, it becomes extra paddings that provides about 4mm distance between the bottom of the laptop to the desk. Being glued and made of mostly cloth and plastic, it doesn’t add much bulk nor weight to the unit — hence doesn’t hinder portability. However one big shortcoming is that it makes the laptop a bit “shaky” when typed on with full-force — but not a big deal with the 2018 MacBook “butterfly” keyboard that doesn’t require (or want) much pressure to be typed on anyway.
Machine Learning on the Mac
As this is my first real foray on machine learning, I needed to start with something that’s newbie-friendly. I settled on the Keras framework, version 2.3.2. This is a Python library that significantly simplifies creation of neural networks. In many cases, constructing a neural network would involve looking at the architecture diagram that is given by the corresponding research paper — what kind of layers, how many of them, how are they connected to each other, and how big are each — and then write out a Python code which declaratively specifies the network configuration.
Keras works in conjunction with a backend library. This is another set of Python libraries that would perform the actual computation of running the neural network. Keras’ primary function is to translate the neural network declarative configuration into a form that the backend library can execute. Tensorflow is the most popular backend at this writing, but Keras officially supports two others: Theano and CNTK. Unfortunately none of those backends have good support for AMD GPUs — the ones that are built into more recent MacBooks.
Enter PlaidML — a backend which aims to make deep learning work everywhere. It has support for OpenCL and Metal — hence AMD GPUs or any other that Apple may choose to support down the line. Note that PlaidML is still experimental and personally I found that it is quite unstable. For example, its OpenCL driver seem to work with recurrent neural networks (RNN) but doesn’t seem to work with standard feed-forward networks. However the Metal driver is the other way around. By “doesn’t work” I mean that the training yield poor accuracies and virtually hopeless to converge — and not utterly fail due to some programming error. This was discovered in PlaidML version 0.3.5.
In any case, I managed to do some basic re-training of both U-Net and TextgenRNN on Keras with PlaidML using my laptop’s GPU. U-Net took several hours to repeat the results from its paper (identifying cell membrane imagery). Whereas TextgenRNN trains in a few minutes to create a useful text-generator of my choice. During these training sessions, the laptop remains still somewhat useful for other purposes — thanks to the GPU and a 6-core CPU. However I needed to be really careful to not launch any program that may require the discrete GPU — notably graphics software such as Affinity Designer. Whenever that happens, GPU contention occurs since the on-going training also uses the AMD GPU. In other words, the macOS’ graphics system would be trying to use the same GPU that is already busy with work. This caused general slowdown of the entire screen — window switching lags, button’s appear unresponsive, scrolling stutters — a really big pain.
For Best Results, Don’t Use Your Laptop
Those long training times lead me to believe that a separate machine is needed to do anything serious with machine learning. A dedicated machine to handle training, which is typically long-running and may even last several days. A machine that won’t need to service other workloads (such as e-mail, Office work, or doing interactive debugging), be interrupted for software updates or whatnot. A machine that doesn’t get folded, suspended, and then taken elsewhere (like what a typical laptop needs to do). In other words, working in machine learning calls for a dedicated server that can run uninterrupted for training or data preparation purposes.
On top of that, you would need an NVidia GPU to do any serious machine learning work. There’s simply better support on NVidia’s hardware than its competitors so far. This is likely due to NVidia’s investments in its CUDA platform that is widely adopted by the machine learning community. Just like to when you need Mac hardware to run some applications, you’ll need an NVidia GPU to do serious machine learning.
Forget about Thunderbolt GPU enclosures if the primary use of the GPU is for machine learning. You can get a decent headless desktop PC with a good NVidia GPU for slightly more than $1000 — a tad more expensive than the Akitio Node Pro GPU enclosure combined with the same GPU (remember that this machine doesn’t need to have its own monitor). Just borrow a TV, keyboard, and mouse to install Linux on it and you’ve got yourself a GPU server. Afterwards access your machine via the network through FTP and SSH. You can still write code on the Mac to do some preliminary testing. But when you finally need to run the full-fledged training cycles, just copy the code and training data over to your GPU server and run it from there. It’s like a nod to the punch card era where you write the program using one tool and then submit the “batch job” to another machine.
Core ML Tools
No discussion of machine learning on the mac would be complete without touching Core ML. One hiccup that I had when converting one of my Keras models to Core ML is that the process failed due to a missing Keras layer type and the error message was “ImportError: cannot import name relu6” . It turned out that support for the new layer type was already incorporated in the Core ML Community Tools library but was not released yet. I ended up recompiling the library from source. Note that the tools depends on a number of native libraries written in C++ and thus simply running “pip install” from its repo would not install these native libraries and reduces the functionality of the tools.
Your Own Poet in a Box
I’ve been fascinated by topics of computer art — notably the ones that enable computers to generate art-forms by itself, such as imageries, music, and prose. I became intrigued with Recurrent Neural Networks (RNN) from Karpathy’s article on “Shakespeare Generator” — that is getting a computer to generate Shakespeare-like works.
After trying out a number of libraries and various code repositories, I settled on TextgenRNN. This is a Keras-based model and library that is roughly based on Karpathy’s algorithm with added optimizations to allow it to train faster. I was able to train a few text generators using the PlaidML backend with the OpenCL driver.
What’s interesting with TextgenRNN library is that longer training or more data doesn’t necessarily create a better model. Re-training the default model on a few hundred kilobytes text on one epoch yield a plausible result. But using the same configuration to train on a hundred megabyte sample with many more epochs doesn’t yield any good results at all — mostly spaces and punctuations. I’m not sure what was the cause of it, but I guess because the network architecture have a limited amount of “memory” and thus given too much training data would “confuse” it. Not unlike a toddler having speech delay due to being taught three languages at the same time (this happened to a neighbor of ours).
Virtual Green Screen
Another machine learning algorithm that I’ve been playing around is U-Net. The algorithm was originally created as part of medical research to process microscope imageries and distinguish cell membranes from its background. However it can also be re-trained to process everyday photographs and separate objects with its backgrounds. I’ve been experimenting with it to process selfies and automatically separate the subject with its background.
I’ve been lucky enough to have licensed a good amount of clipart images containing people with transparent backgrounds. These were purchased as part of clipart bundle deals, which originally I’ve intended to use for Internet marketing purposes. These clipart images forms the early dataset for re-training the U-Net algorithm to process selfies. Through the magic of ImageMagick and shell scripting, I was able to programmatically combine these cutout images with photographs of sceneries to create the training data set.
A while after, I discover the COCO dataset. These are various sample photographs that already has markings of the objects in those photographs. It also comes with software libraries to query the dataset — like for example filtering just people photos and getting their respective markings. Then it’s relatively straightforward to create a script which takes the people photos, crop and resize it to the desired dimensions, and create the corresponding image masks.
These data preparation tasks would is the less glamorous part — and often less discussed — of machine learning. Which reminds me of Karpathy’s post of Software 2.0 vs Software 1.0. He argues that most current applications are Software 1.0, written mostly by people. In contrast, Software 2.0 is written primarily by the computer itself via training and optimization algorithms. However it’s quite apparent that the training data needed to build Software 2.0 is mostly done by Software 1.0.
In any case, I was able to re-train U-Net, convert it to a Core ML model and create a prototype selfie app. Frame rates isn’t great, about one prediction per second on an iPhone 8. But it works quite well. Have a look at the the prototype shown below. In the prototype, the machine learning model’s result is overlaid with the camera image. The areas that the model thinks is the foreground is rendered lighter whereas the background is rendered darker. It’s nothing much, but a good start — since a photograph of me nor our living room is never in the training data set. Have a look at the screen capture below.
The Hype of Machine Learning
These hype about machine learning being “Software 2.0” reminds me of an older hype of “4th generation languages” (4GL) back in the 1990s. The premise that these 4GL systems can create applications software with minimal programming, or even no programming at all. In contrast earlier 3GL systems would be eventually displaced by these “low-code” environments. Fast-forward 20 years later, 3GL programming languages are still alive and well — which spans from old-timers such as C++, up to Python and even Swift — and still not yet replaced by 4GL environments. Whats worse is that the 4GL of the 1990s nowadays are mostly in niche or declining markets. Examples include PowerBuilder, Clarion, and Clipper. I guess history repeats itself.
That’s all for now, take care.