My personal computer was a 4 year old MacBook Pro with a permanent battery service warning, a butterfly keyboard (AKA Apple’s greatest hardware design mistake), and a high chassis temperature issue when I use the left USB-C to charge it, which throttled its Intel i7 CPU and spun the fan so much that it was quite noticeable — I guess what I am trying to say here is, my personal computer was due for a replacement.
After weeks of research, I ended up purchasing a Mac Mini with Apple’s M1 chip and 16 GB of memory. It arrived on Dec 9th and I have been using it for about 2 months now as my daily driver and primary web development machine. In this post I will be sharing my experience migrating away from my Intel-based Mac and how I set up a clean and productive web development environment on the new Apple silicon based Macs.
I will start out by highlighting some of the research I did around the M1 SoC and explain a little on how Apple has managed to achieve great results emulating stronger memory models on their ARM processor, before I walk you through my web development environment setup with Homebrew, Go, Node.js, and Docker.
Researching Apple Silicon
Apple’s “One more thing…'' event in November brought us their M1 SoC and a handful of numbers: up to 3.5x faster CPU performance, up to 6x faster graphics, up to 2x battery life, and faster than 98 percent of PC laptops.
Although some people were skeptical about Apple’s claims, there was enough third-party data soon after the announcement that confirmed Apple’s M1 chip does provide a substantial performance increase.
In hindsight, the performance and energy efficiency of the M1 chip should have not really been that surprising. Over two years ago, the reputable tech reviews magazine Anandtech ran the SPEC2006 industry standard benchmark on Apple’s A12 chip, and we learned that the A12 chips Apple used in their iPhone XS are capable of handling desktop workloads. They were capable of outperforming a moderately-clocked Skylake CPU in single-threaded tasks while having better energy efficiency than all Android SoCs.
As it became quickly evident that those who bought an M1 Mac would enjoy great responsiveness and blazing fast startup times, I shifted my attention to the main concern: the ARM architecture. I wanted to understand which Intel-based executables Rosetta 2 would not be able to translate? And how stable and performant were Rosetta 2 translated binaries?.
Understanding Rosetta 2 Behavior and Performance
On the surface, nothing about Rosetta 2’s behaviour is out of the ordinary. It takes x86_64
instructions and translates them to arm64
instructions ahead-of-time. Once the translation process is finished, the translated arm64
code blocks are cached so the subsequent executable launches do not need to repeat this process.
To get some insight on how Rosetta 2 translated executables perform compared to native arm64
ones, Anandtech ran various SPEC benchmarks on the M1 chip in both formats. The results showed that memory-intensive x86_64
workloads translated by Rosetta 2 consistently achieved more than 90% of the native arm64
speed and CPU-intensive workloads performed at 70-80% of its native counterparts. Overall, these results are simply outstanding. Combined with the raw power of the M1 chip, people should, in theory, barely notice any difference using translated macOS x86_64
applications.
So, we now know that Rosetta 2 translated executables performance is great in general, and specifically for memory-intensive workloads, the question becomes: how is this possible?
M1’s Approach To Emulating Strongly Ordered Memory Models on ARM
At a fundamental level, a multi-core processor requires reads and writes to memory to be communicated between its cores in a consistent manner. Each processor architecture defines the semantics of this communication as a part of its memory consistency model (often referred to as just memory model). The ARM architecture used by the M1 SoC provides weaker memory ordering semantics than the x86 architecture used by Intel processors. To understand the difference between the two, let’s look at the four basic types of memory reads & writes orderings:
- Write → Read: write must complete before subsequent read
- Read → Read: read must complete before subsequent read
- Read → Write: read must complete before subsequent write
- Write → Write: write must complete before subsequent write
Strongest memory model (known as Sequential consistency) maintains all orderings above, and ensures any change is communicated before the next instruction is run. This intuitive, sequential execution of instruction is bad for performance, and defeats the purpose of having multiple cores that can run things on parallel. Weaker consistency models allow some orderings to be violated, enabling the processors to overlap memory access with other operations. The table below captures which reoderings ARM’s “weaker” memory model and the “stronger” X86 one allow.
Type | ARM | X86 |
---|---|---|
Writes can be reordered after reads | Yes | Yes |
Reads can be reordered after reads | Yes | - |
Reads can be reordered after writes | Yes | - |
Writes can be reordered after writes | Yes | - |
As you can see, ARM chose to adopt a memory model that allows any of the four basic memory operations to be reordered as a way to beat sequential consistency and make a wide range of hardware optimizations possible. x86 on the other hand chose to preserve orderings for the most parts, and adopt an on-core write buffers as a way to beat sequential consistency and hide memory writes latency. x86’s behavior is often referred to as Total Store Ordering (or TSO) where write operations are placed on the local write buffer, and subsequent read instructions can be executed before that write changes are communicated.
When compiling binaries for ARM or x86 architectures, we are basically informing our software what are all the reorderings allowed by the target architecture that it needs to account for to ensure correctness.
Typically, this makes it extremely difficult to emulate x86 correctly on ARM without introducing a significant performance disadvantage caused by the ARM processor trying to explicitly guarantee the orderings x86 executables expect. While there aren’t any official details available publicly on how Apple has managed to overcome this hurdle, a plausible explanation is that it’s been solved at the hardware level, and the M1 chip is capable of switching between ARM’s memory model and the x86 TSO.
Further readings on the topic:
- Emulation of strongly ordered memory models on Google Patents
- Kernel extension that enables TSO for Apple silicon processes on GitHub
- Social media summaries: Reddit, HN and Twitter
What are Rosetta’s limitations?
In an official documentation page, Apple stated that Rosetta 2 can translate most Intel-based apps, including the ones that contain just-in-time (JIT) compilers. The two exception are:
- Kernel extensions
- Virtual Machine apps that virtualize x86_64 computer platforms
Catalina is the last version of macOS that supports third-party kernel extensions. Without these, developers are now limited to the collection of APIs and frameworks that macOS officially provides and supports. In turn, this means that Mac developers’ freedom and creativity are limited with BigSur – but that’s a topic for another day.
The lack of support for virtual machines was a big concern. It meant I wouldn’t be able to run x86_64
Docker because it runs virtual machines under the hood. Luckily there was a path forward, as Docker was able to shift to Apple’s new hypervisor framework and provide arm64
binaries. Today, Docker supports M1 chips natively.
Decision Day
Fascinated and feeling good about the M1’s performance and Rosetta 2 efficiency, I decided it was safe to buy a M1 Mac Mini and give it my own series of real-world tests. And I figured that if it did not meet my expectations or needs, I could take advantage of Apple’s return policy, or hang onto it as a secondary computer or as a media server. (Spoiler: It’s neither!)
Installing Rosetta 2
To make the process of setting up a new Mac less cumbersome and repetitive, I -like many others- have a shell script to configure new installations of macOS, copy my dotfiles and install some of the necessary tools via Homebrew.
If this “setup my new macOS” shell script is being executed on an Apple silicon based Mac, one of the first things we need to do is to check for an existing installation of Rosetta 2, and if one was not found, we should go ahead and install it.
Detecting the processor’s architecture
We can use the uname
utility to get details about the processor’s architecture (e.g. arm, i386, i686, arm, etc), the machine’s hardware class (e.g. arm64, x86_64, etc) as well as various other characteristics of our system.
#!/bin/bash
PROC_ARCH="$(uname -m)"
if [ "${PROC_ARCH}" = "x86_64" ]; then
echo "Intel-based mac with x86_64 architecture or Rosetta2 translated process"
elif [ "${PROC_ARCH}" = "arm64" ]; then
echo "Apple silicon mac with arm64 architecture"
else
echo "Unknown architecture: ${PROC_ARCH}"
fi
Perform a non-interactive installation of Rosetta 2
If the processor’s architecture was arm64
and no existing installation of Rosetta2 has already been found, the setup script should install Rosetta2 non-interactively.
#!/bin/bash
# credit: https://github.com/rtrouton/rtrouton_scripts/blob/master/rtrouton_scripts/install_rosetta_on_apple_silicon/install_rosetta_on_apple_silicon.sh
PROC_ARCH="$(uname -m)"
if [ "${PROC_ARCH}" = "arm64" ]; then
# Check Rosetta LaunchDaemon. If it was not found
# perform a non-interactive installation of Rosetta.
if [[ ! -f "/Library/Apple/System/Library/LaunchDaemons/com.apple.oahd.plist" ]]; then
/usr/sbin/softwareupdate --install-rosetta --agree-to-license
if [[ $? -eq 0 ]]; then
echo "Rosetta2 has been successfully installed."
else
echo "Rosetta2 installation failed!"
fi
else
echo "Rosetta2 is already installed."
fi
fi
Bonus: determine native Intel vs Rosetta2 translated environment
If a new Terminal window is opened using Rosetta 2, the uname -m
command will print x86_64
even though we are on an arm64
machine.
We can programmatically determine when a process is running under Rosetta2 translation by checking the value of sysctl.proc_translated
kernel variable using sysctl
command. It can have one the following values:
0
: for Apple silicon native process1
: for Rosetta2 translated process""
: in case the OID was not found (e.g. you are looking forsysctl.proc_translated
on an older mac running Catalina.)
#!/bin/bash
IS_PROC_TRANSLATED="$(sysctl -n -i sysctl.proc_translated)"
if [ "${IS_PROC_TRANSLATED}" = "1" ]; then
echo "Running with Rosetta 2"
else
echo "Running native Intel"
fi
Now that we know how to determine the processor’s architecture and have Rosetta 2 installed, we can proceed to install and setup Homebrew!
Installing Homebrew
Homebrew started offering support for Apple silicon in v2.6.0, but given not all of the binaries you may need are arm64
-ready yet, it’s recommended to have two Homebrew installations side-by-side. If a formula offers arm64
binaries, we will install it using the arm64
Homebrew located in /opt/homebrew
, otherwise we will fallback to the x86_64
Homebrew installed using Rosetta 2 located under /usr/local
.
Customizing Homebrew installation location is not possible via their installer script, so for the Apple silicon Homebrew installation we will grab their latest tarball and untar it under /opt/homebrew
as you will see in this following snippet:
#!/bin/bash
PROC_ARCH="$(uname -m)"
# native Intel-based mac or Rosetta2
if [ "${PROC_ARCH}" = "x86_64" ]; then
# checking if an installation already exists
if [[ ! -d "/usr/local/Homebrew" ]];then
echo "Echo installing Homebrew for x86_64"
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
fi
elif [ "${PROC_ARCH}" = "arm64" ]; then
# checking if a Rosetta2 installation already exists
if [[ ! -d "/usr/local/Homebrew" ]];then
echo "Installing Homebrew for x86_64 via Rosetta2"
arch --x86_64 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
fi
# checking if an Apple silicon native installation already exists
if [[ ! -d "/opt/homebrew" ]];then
echo "Installing Homebrew for arm64"
# https://docs.brew.sh/Installation
# https://soffes.blog/homebrew-on-apple-silicon
sudo mkdir -p /opt/homebrew
sudo chown -R $(whoami):staff /opt/homebrew
cd /opt
# unfortunatly, `main` tarball does not exist, it's still called master
curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew
fi
else
echo "Sorry can't install Homebrew on this unknown architecture: ${PROC_ARCH}"
fi
Dual brew aliases
Since we have two Homebrew installations, we need to have two brew
aliases. The first one is brew
, the default, pointing to the Apple silicon Homebrew installation, and the 2nd is brewr2
(stands for Brew Rosetta 2) pointing to the x86_64
one.
# arm64 homebrew in /opt/homebrew is the default option
export PATH="/opt/homebrew/bin:/usr/local/bin:$PATH"
# rosetta 2 homebrew alias
alias brewr2='arch -x86_64 /usr/local/bin/brew'
after reloading your shell profile, you should be able to see something like this:
$ which brew
/opt/homebrew/bin/brew
$ which brewr2
brewr2: aliased to arch -x86_64 /usr/local/bin/brew
Force-launch Rosetta2 brew packages
In many instances, developers are now starting to offer experimental or beta arm64
support for their packages on Homebrew, but you may still want to
specifically install and use the older, and usually more stable, x86_64
one via Rosetta2.
For such cases, I will show you another alias, brewr2x
(which stands for brew Rosetta2 execute), that will help you launch the x86_64
version of a Homebrew package:
alias brewr2x='PATH=/usr/local/bin'
Here’s an example usage:
# installs arm64 go (1.16beta1 or above)
$ brew install go
# installs x86_64 go (1.15.6 or above)
$ brewr2 install go
# execute the arm64 binary
$ go version
go version go1.16beta1 darwin/arm64
# force execute the x86 binary
$ brewr2x go version
go version go1.15.7 darwin/amd64
Miscellaneous developer tools
As of today - February, 13th 2021 - many of the tools I use for my personal development work offer Apple silicon support: WebStorm and GoLand IDEs, all major Web browsers, Docker (via Tech Preview), VS Code (via insiders builds) and iTerm2, just to name a few. If you are interested in learning whether an App or a Homebrew package you heavily rely is now Apple silicon ready, my favorite website is DoesItARM, it’s much cleaner and better organized than other ones out there.
-
Go started offering
arm64
binaries with version 1.16 (unstable), and the Homebrew bottle has been updated to offer those. However, I still use the stablex86_64
Go binaries via Rosetta 2 and can’t notice any difference building, running and testing my Go projects. -
I use Node.js purely for front-end development, e.g. installing packages via npm, running local web development servers and building my client-side assets. The Node.js Homebrew bottle seem to have been updated to support Apple silicon though I have not yet migrated from my
x86_64
Rosetta 2 installation. The performance installing packages, building and running local servers via Rosetta 2 is so great that I don’t feel the urgency to migrate toarm64
Node.js yet, even though I should. -
Docker’s “Tech Preview”
arm64
builds that use Apple’s new hypervisor framework are extremely stable. More importantly, Docker has been supporting Multi-Arch images for almost two years now, meaning you can build and run both x86 and ARM images on the M1 Macs. While all images I rely on are multi-arch by default, including KIND’s (Kubernetes IN Docker) base image,kind-node
image specifically targets x86 architecture, until that one is also multi-arch I am using rossgeorgiev/kind-node-arm64 without any issues.
Looking ahead
I am extremely happy with my M1 Mac Mini, it’s fast and completely silent that I now consider my Nintendo Switch to be noisy. Adding that to the advancement Apple has made to ensure Rosetta 2 translated binaries perform great, and the speed the developer ecosystem and software vendors are moving at to offer native experiences for Apple silicon Mac users, an M1 Mac Mini is a great choice for many web developers.
This leaves us with one last question: for how long the Apple silicon based Macs are going to feel as responsive as they are today?
One of my favorite software engineering bloggers, Fabien Sanglard, has recently referenced Andy and Bill’s law in one of his recent posts, and brought up an extremely valid point: for every cycle a hardware engineer saves, a software engineer will add two instructions. While it’s hard for us to predict how long it will take for the M1 responsiveness & power efficiency to degrade, one can argue that certain Apple’s strategies like moving developers away from extending the kernel and requiring VMs to use their new hypervisor framework are just measures to ensure their new Macs remain in a good shape -performance and power efficiency wise- longer than their competitors.