My Dev Environment Setup on Apple Silicon

My personal computer was a 4 year old MacBook Pro with a permanent battery service warning, a butterfly keyboard (AKA Apple’s greatest hardware design mistake), and a high chassis temperature issue when I use the left USB-C to charge it, which throttled its Intel i7 CPU and spun the fan so much that it was quite noticeable — I guess what I am trying to say here is, my personal computer was due for a replacement.

After weeks of research, I ended up purchasing a Mac Mini with Apple’s M1 chip and 16 GB of memory. It arrived on Dec 9th and I have been using it for about 2 months now as my daily driver and primary web development machine. In this post I will be sharing my experience migrating away from my Intel-based Mac and how I set up a clean and productive web development environment on the new Apple silicon based Macs.

I will start out by highlighting some of the research I did around the M1 SoC and explain a little on how Apple has managed to achieve great results emulating stronger memory models on their ARM processor, before I walk you through my web development environment setup with Homebrew, Go, Node.js, and Docker.

Researching Apple Silicon

Apple’s “One more thing…'' event in November brought us their M1 SoC and a handful of numbers: up to 3.5x faster CPU performance, up to 6x faster graphics, up to 2x battery life, and faster than 98 percent of PC laptops.

Although some people were skeptical about Apple’s claims, there was enough third-party data soon after the announcement that confirmed Apple’s M1 chip does provide a substantial performance increase.

In hindsight, the performance and energy efficiency of the M1 chip should have not really been that surprising. Over two years ago, the reputable tech reviews magazine Anandtech ran the SPEC2006 industry standard benchmark on Apple’s A12 chip, and we learned that the A12 chips Apple used in their iPhone XS are capable of handling desktop workloads. They were capable of outperforming a moderately-clocked Skylake CPU in single-threaded tasks while having better energy efficiency than all Android SoCs.

As it became quickly evident that those who bought an M1 Mac would enjoy great responsiveness and blazing fast startup times, I shifted my attention to the main concern: the ARM architecture. I wanted to understand which Intel-based executables Rosetta 2 would not be able to translate? And how stable and performant were Rosetta 2 translated binaries?.

Understanding Rosetta 2 Behavior and Performance

On the surface, nothing about Rosetta 2’s behaviour is out of the ordinary. It takes x86_64 instructions and translates them to arm64 instructions ahead-of-time. Once the translation process is finished, the translated arm64 code blocks are cached so the subsequent executable launches do not need to repeat this process.

To get some insight on how Rosetta 2 translated executables perform compared to native arm64 ones, Anandtech ran various SPEC benchmarks on the M1 chip in both formats. The results showed that memory-intensive x86_64 workloads translated by Rosetta 2 consistently achieved more than 90% of the native arm64 speed and CPU-intensive workloads performed at 70-80% of its native counterparts. Overall, these results are simply outstanding. Combined with the raw power of the M1 chip, people should, in theory, barely notice any difference using translated macOS x86_64 applications.

So, we now know that Rosetta 2 translated executables performance is great in general, and specifically for memory-intensive workloads, the question becomes: how is this possible?

M1’s Approach To Emulating Strongly Ordered Memory Models on ARM

At a fundamental level, a multi-core processor requires reads and writes to memory to be communicated between its cores in a consistent manner. Each processor architecture defines the semantics of this communication as a part of its memory consistency model (often referred to as just memory model). The ARM architecture used by the M1 SoC provides weaker memory ordering semantics than the x86 architecture used by Intel processors. To understand the difference between the two, let’s look at the four basic types of memory reads & writes orderings:

Write → Read: write must complete before subsequent read
Read → Read: read must complete before subsequent read
Read → Write: read must complete before subsequent write
Write → Write: write must complete before subsequent write

Strongest memory model (known as Sequential consistency) maintains all orderings above, and ensures any change is communicated before the next instruction is run. This intuitive, sequential execution of instruction is bad for performance, and defeats the purpose of having multiple cores that can run things on parallel. Weaker consistency models allow some orderings to be violated, enabling the processors to overlap memory access with other operations. The table below captures which reoderings ARM’s “weaker” memory model and the “stronger” X86 one allow.

Type	ARM	X86
Writes can be reordered after reads	Yes	Yes
Reads can be reordered after reads	Yes	-
Reads can be reordered after writes	Yes	-
Writes can be reordered after writes	Yes	-

source Wikipedia

As you can see, ARM chose to adopt a memory model that allows any of the four basic memory operations to be reordered as a way to beat sequential consistency and make a wide range of hardware optimizations possible. x86 on the other hand chose to preserve orderings for the most parts, and adopt an on-core write buffers as a way to beat sequential consistency and hide memory writes latency. x86’s behavior is often referred to as Total Store Ordering (or TSO) where write operations are placed on the local write buffer, and subsequent read instructions can be executed before that write changes are communicated.

When compiling binaries for ARM or x86 architectures, we are basically informing our software what are all the reorderings allowed by the target architecture that it needs to account for to ensure correctness.

Typically, this makes it extremely difficult to emulate x86 correctly on ARM without introducing a significant performance disadvantage caused by the ARM processor trying to explicitly guarantee the orderings x86 executables expect. While there aren’t any official details available publicly on how Apple has managed to overcome this hurdle, a plausible explanation is that it’s been solved at the hardware level, and the M1 chip is capable of switching between ARM’s memory model and the x86 TSO.

What are Rosetta’s limitations?

In an official documentation page, Apple stated that Rosetta 2 can translate most Intel-based apps, including the ones that contain just-in-time (JIT) compilers. The two exception are:

Kernel extensions
Virtual Machine apps that virtualize x86_64 computer platforms

Catalina is the last version of macOS that supports third-party kernel extensions. Without these, developers are now limited to the collection of APIs and frameworks that macOS officially provides and supports. In turn, this means that Mac developers’ freedom and creativity are limited with BigSur – but that’s a topic for another day.

The lack of support for virtual machines was a big concern. It meant I wouldn’t be able to run x86_64 Docker because it runs virtual machines under the hood. Luckily there was a path forward, as Docker was able to shift to Apple’s new hypervisor framework and provide arm64 binaries. Today, Docker supports M1 chips natively.

Decision Day

Fascinated and feeling good about the M1’s performance and Rosetta 2 efficiency, I decided it was safe to buy a M1 Mac Mini and give it my own series of real-world tests. And I figured that if it did not meet my expectations or needs, I could take advantage of Apple’s return policy, or hang onto it as a secondary computer or as a media server. (Spoiler: It’s neither!)

Installing Rosetta 2

To make the process of setting up a new Mac less cumbersome and repetitive, I -like many others- have a shell script to configure new installations of macOS, copy my dotfiles and install some of the necessary tools via Homebrew.

If this “setup my new macOS” shell script is being executed on an Apple silicon based Mac, one of the first things we need to do is to check for an existing installation of Rosetta 2, and if one was not found, we should go ahead and install it.

Detecting the processor’s architecture

We can use the uname utility to get details about the processor’s architecture (e.g. arm, i386, i686, arm, etc), the machine’s hardware class (e.g. arm64, x86_64, etc) as well as various other characteristics of our system.

#!/bin/bash

PROC_ARCH="$(uname -m)"

if [ "${PROC_ARCH}" = "x86_64" ]; then
  echo "Intel-based mac with x86_64 architecture or Rosetta2 translated process"
elif [ "${PROC_ARCH}" = "arm64" ]; then
  echo "Apple silicon mac with arm64 architecture"
else
  echo "Unknown architecture: ${PROC_ARCH}"
fi

Perform a non-interactive installation of Rosetta 2

If the processor’s architecture was arm64 and no existing installation of Rosetta2 has already been found, the setup script should install Rosetta2 non-interactively.

#!/bin/bash

# credit: https://github.com/rtrouton/rtrouton_scripts/blob/master/rtrouton_scripts/install_rosetta_on_apple_silicon/install_rosetta_on_apple_silicon.sh

PROC_ARCH="$(uname -m)"

if [ "${PROC_ARCH}" = "arm64" ]; then
  # Check Rosetta LaunchDaemon. If it was not found
  # perform a non-interactive installation of Rosetta.
  if [[ ! -f "/Library/Apple/System/Library/LaunchDaemons/com.apple.oahd.plist" ]]; then
    /usr/sbin/softwareupdate --install-rosetta --agree-to-license
    if [[ $? -eq 0 ]]; then
      echo "Rosetta2 has been successfully installed."
    else
      echo "Rosetta2 installation failed!"
    fi
  else
    echo "Rosetta2 is already installed."
  fi
fi

Bonus: determine native Intel vs Rosetta2 translated environment

If a new Terminal window is opened using Rosetta 2, the uname -m command will print x86_64 even though we are on an arm64 machine.

We can programmatically determine when a process is running under Rosetta2 translation by checking the value of sysctl.proc_translated kernel variable using sysctl command. It can have one the following values:

0: for Apple silicon native process
1: for Rosetta2 translated process
"": in case the OID was not found (e.g. you are looking for sysctl.proc_translated on an older mac running Catalina.)

#!/bin/bash

IS_PROC_TRANSLATED="$(sysctl -n -i sysctl.proc_translated)"

if [ "${IS_PROC_TRANSLATED}" = "1" ]; then
  echo "Running with Rosetta 2"
else
  echo "Running native Intel"
fi

Now that we know how to determine the processor’s architecture and have Rosetta 2 installed, we can proceed to install and setup Homebrew!

Installing Homebrew

Homebrew started offering support for Apple silicon in v2.6.0, but given not all of the binaries you may need are arm64-ready yet, it’s recommended to have two Homebrew installations side-by-side. If a formula offers arm64 binaries, we will install it using the arm64 Homebrew located in /opt/homebrew, otherwise we will fallback to the x86_64 Homebrew installed using Rosetta 2 located under /usr/local.

Customizing Homebrew installation location is not possible via their installer script, so for the Apple silicon Homebrew installation we will grab their latest tarball and untar it under /opt/homebrew as you will see in this following snippet:

#!/bin/bash

PROC_ARCH="$(uname -m)"

# native Intel-based mac or Rosetta2
if [ "${PROC_ARCH}" = "x86_64" ]; then
  # checking if an installation already exists
  if [[ ! -d "/usr/local/Homebrew" ]];then
    echo "Echo installing Homebrew for x86_64"
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  fi
elif [ "${PROC_ARCH}" = "arm64" ]; then
  # checking if a Rosetta2 installation already exists
  if [[ ! -d "/usr/local/Homebrew" ]];then
    echo "Installing Homebrew for x86_64 via Rosetta2"
    arch --x86_64 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  fi
  # checking if an Apple silicon native installation already exists
  if [[ ! -d "/opt/homebrew" ]];then
    echo "Installing Homebrew for arm64"
    # https://docs.brew.sh/Installation
    # https://soffes.blog/homebrew-on-apple-silicon
    sudo mkdir -p /opt/homebrew
    sudo chown -R $(whoami):staff /opt/homebrew
    cd /opt
    # unfortunatly, `main` tarball does not exist, it's still called master
    curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew  
  fi
else
    echo "Sorry can't install Homebrew on this unknown architecture: ${PROC_ARCH}"
fi

Dual brew aliases

Since we have two Homebrew installations, we need to have two brew aliases. The first one is brew, the default, pointing to the Apple silicon Homebrew installation, and the 2nd is brewr2 (stands for Brew Rosetta 2) pointing to the x86_64 one.


# arm64 homebrew in /opt/homebrew is the default option
export PATH="/opt/homebrew/bin:/usr/local/bin:$PATH"

# rosetta 2 homebrew alias
alias brewr2='arch -x86_64 /usr/local/bin/brew'

after reloading your shell profile, you should be able to see something like this:

$ which brew
/opt/homebrew/bin/brew

$ which brewr2
brewr2: aliased to arch -x86_64 /usr/local/bin/brew

Force-launch Rosetta2 brew packages

In many instances, developers are now starting to offer experimental or beta arm64 support for their packages on Homebrew, but you may still want to specifically install and use the older, and usually more stable, x86_64 one via Rosetta2.

For such cases, I will show you another alias, brewr2x (which stands for brew Rosetta2 execute), that will help you launch the x86_64 version of a Homebrew package:

alias brewr2x='PATH=/usr/local/bin'

Here’s an example usage:


# installs arm64 go (1.16beta1 or above)
$ brew install go

# installs x86_64 go (1.15.6 or above)
$ brewr2 install go

# execute the arm64 binary
$ go version
go version go1.16beta1 darwin/arm64

# force execute the x86 binary
$ brewr2x go version
go version go1.15.7 darwin/amd64

Miscellaneous developer tools

As of today - February, 13th 2021 - many of the tools I use for my personal development work offer Apple silicon support: WebStorm and GoLand IDEs, all major Web browsers, Docker (via Tech Preview), VS Code (via insiders builds) and iTerm2, just to name a few. If you are interested in learning whether an App or a Homebrew package you heavily rely is now Apple silicon ready, my favorite website is DoesItARM, it’s much cleaner and better organized than other ones out there.

Go started offering arm64 binaries with version 1.16 (unstable), and the Homebrew bottle has been updated to offer those. However, I still use the stable x86_64 Go binaries via Rosetta 2 and can’t notice any difference building, running and testing my Go projects.
I use Node.js purely for front-end development, e.g. installing packages via npm, running local web development servers and building my client-side assets. The Node.js Homebrew bottle seem to have been updated to support Apple silicon though I have not yet migrated from my x86_64 Rosetta 2 installation. The performance installing packages, building and running local servers via Rosetta 2 is so great that I don’t feel the urgency to migrate to arm64 Node.js yet, even though I should.
Docker’s “Tech Preview” arm64 builds that use Apple’s new hypervisor framework are extremely stable. More importantly, Docker has been supporting Multi-Arch images for almost two years now, meaning you can build and run both x86 and ARM images on the M1 Macs. While all images I rely on are multi-arch by default, including KIND’s (Kubernetes IN Docker) base image, kind-node image specifically targets x86 architecture, until that one is also multi-arch I am using rossgeorgiev/kind-node-arm64 without any issues.

Looking ahead

I am extremely happy with my M1 Mac Mini, it’s fast and completely silent that I now consider my Nintendo Switch to be noisy. Adding that to the advancement Apple has made to ensure Rosetta 2 translated binaries perform great, and the speed the developer ecosystem and software vendors are moving at to offer native experiences for Apple silicon Mac users, an M1 Mac Mini is a great choice for many web developers.

This leaves us with one last question: for how long the Apple silicon based Macs are going to feel as responsive as they are today?

One of my favorite software engineering bloggers, Fabien Sanglard, has recently referenced Andy and Bill’s law in one of his recent posts, and brought up an extremely valid point: for every cycle a hardware engineer saves, a software engineer will add two instructions. While it’s hard for us to predict how long it will take for the M1 responsiveness & power efficiency to degrade, one can argue that certain Apple’s strategies like moving developers away from extending the kernel and requiring VMs to use their new hypervisor framework are just measures to ensure their new Macs remain in a good shape -performance and power efficiency wise- longer than their competitors.