ProtonAOSP features a number of performance improvements that make the overall system performance significantly better:
- Up to 18% faster app/menu/screen opening
- 16% faster screenshot capturing
- Up to 4x faster low-level memory management
- Faster image loading and saving (JPEG and PNG)
Benchmark results are available to quantify these performance improvements. Most of them are the result of empirically profiling for bottlenecks and optimizing accordingly.
The sections below describe the technical details of our optimizations. All pre-optimization profiling percentages were sourced from a Pixel 5 running ProtonAOSP 11.3.1, unless otherwise stated. “The Settings test” refers to opening and closing activities in Settings, specifically Developer Options because of the amount of preferences it contains.
Most of ProtonAOSP’s performance improvements are in native components, which comprise much of the system’s performance-critical code.
Android 11 switched to the Scudo memory allocator for security hardening, but this comes at the expense of performance. We trade the ability to detect memory usage bugs for performance instead by using the latest stable version of jemalloc, updated from the official repository.
We use an optimized fork of the ubiquitous zlib data compression library, zlib-ng, to improve compression and decompression performance for many use cases:
- HTTP gzip compression
- PNG compression (e.g. screenshot saving and image editing)
- Android resource loading
- ZIP archives
Combined with other improvements, this speeds up screenshot saving by 16% on the Pixel 5 and likely contributes to the faster cold app launches as well.
Bionic libc includes string and memory routines used by nearly every process on Android. We use more optimized versions of these commonly-used functions by porting them from Arm’s arm-optimized-routines project.
The LLVM/Clang compiler includes support for ThinLTO, a DSO-wide (i.e. program-wide or library-wide) optimization that can improve performance significantly in many cases because it improves the compiler’s ability to inline code.
We added Android 12’s experimental mode to enable ThinLTO globally for most components in the system and enabled it by default, which improves app launch performance by ~2% in addition to the individual components we already enabled ThinLTO for before this.
On modern CPUs, leveraging SIMD is key to maximizing performance in compute-heavy workloads such as image and data compression.
We have either updated or switched to accelerated forks of the following libraries in order to take (more) advantage of NEON SIMD and/or other ARMv8.2-A extensions (e.g. CRC32 and polynomial multiplication):
We’ve disabled the statsd daemon, which collects diagnostic statistics that are normally unused on ProtonAOSP. In our testing, statsd itself accounted for 0.04% of CPU time in the Settings test, with more overhead (over 0.02%) from clients serializing and sending stats for collection.
Similarly, we disabled debug tracing in ART to save 33% of the 0.1% CPU time spent checking whether specific trace tags are enabled.
Google enabled additional compiler optimizations (
-O3) for some components in Android 12. While this can cause breakage in some cases, we followed suit with the components that Google deemed safe to optimize:
We use a newer version of the Clang compiler from Android 12: Clang 12.0.4. This does not make much of a difference by itself, but it helps global ThinLTO work better by avoiding compiler bugs and taking advantage of newer LLVM optimizations.
Some other miscellaneous optimizations in native libraries have been ported from Android 12:
While we have more optimizations focused on native code, higher-level Java code has also been optimized according to simpleperf profiles.
Percentages in the following sections refer to global CPU time in the Settings test. In indented sections, the percentage is a fraction of the parent items.
- Reduce reflection overhead (12% of 0.39% ART JNI trampolines)
- Replace ArrayMap with HashMap for performance (0.12%)
- Reduce debugging overhead
- Reduce interface method trampoline overhead (0.32%)
- Reduce unnecessary CPU-rendered screenshot captures (over 2%)
- More efficient app process forking
- Add zygote native fork loop (ported from Android 12)
Raw benchmark results and analysis spreadsheets can be found in the Google Drive folder. Most of the results are from a Pixel 5.