AVX Revectorization Evaluation Guide

Overview

Latest update is tracked in issue #12716. The revectorization pass has been ported from V8’s turbofan compiler to the new turboshaft pipeline. The turboshaft wasm pipeline is enabled by default after Chrome 132.0.6829.1 by CL.

The command line flags are as listed below:

Baseline turboshaft (by default): "--turboshaft-wasm --turboshaft-wasm-instruction-selection-staged"
Enable revectorization: "--experimental-wasm-revectorize"
Trace revectorization: "--trace-wasm-revectorize"

Node

To enable wasm revectorization with node, please make sure to include below PR to update the build config files: https://github.com/nodejs/node/pull/54896 (merged after 2024, Dec 8th, in version 23.5.0) Please also update node to the latest version or a version after Aug 25 that patched to V8 12.8.374.22.

Default mode

Baseline Starting from 24.0, turboshaft wasm is enabled by default, no additional flags needed. If you run with an older version node.js, you need to enable turboshaft wasm manually as below:

$ node --turboshaft-wasm --turboshaft-wasm-instruction-selection-staged

Revec

$ node [--turboshaft-wasm --turboshaft-wasm-instruction-selection-staged] --experimental-wasm-revectorize

Test with turboshaft only

By default v8 enables lazy compilation and liftoff baseline compilation before tiering up to turbofan/turboshaft for advanced optimization. The AVX revectorization phase is enabled in Turboshaft. If the test only runs a few times, it may not get the chance to enter Turboshaft phase and get optimized to AVX-256.

Baseline

$ node [--turboshaft-wasm --turboshaft-wasm-instruction-selection-staged] --no-liftoff --no-wasm_lazy_compilation

Revec

$ node [--turboshaft-wasm --turboshaft-wasm-instruction-selection-staged] --no-liftoff --no-wasm_lazy_compilation --experimental-wasm-revectorize

Trace revec

By default concurrent compilation is enabled which will make the output message mixed for different function units. It is recommended to disable concurrent compilation when trace for revectorization.

$ node [--turboshaft-wasm --turboshaft-wasm-instruction-selection-staged] --experimental-wasm-revectorize --wasm-num-compilation-tasks=1

Chrome

Windows

Steps:

Download a Canary Chrome from https://www.google.com/chrome/canary/ or an internal version after 132.0.6829.1.
[For manual test] Open a command window and go to the directory where Chrome.exe is located. It normally located at "C:\Program Files\Google\Chrome\Application" or you can identify the Executable Path by opening “chrome://version/” from Chrome browser.
[For manual test] Run below commands to launch Chrome browser with a clear disk cache:

baseline:

>chrome.exe –user-data-dir=”%TEMP%\base”
Revec:

>chrome.exe –user-data-dir=”%TEMP%\revec” –js-flags=--experimental-wasm-revectorize

[For automation] Setup the Chrome flags with setting "–js-flags=--experimental-wasm-revectorize". Note that there is no additional " quotation marks after --js-flags. This seems not work with some automation API.

Verify revectorization

We can quickly verify if the revectorization is enabled successfully by enable logging and print out the trace to stderr. But this may generate too many messages, and we need to copy out console buffer manually. (On Linux, you can redirect to the output to a file.)

>chrome.exe –user-data-dir=”%TEMP%\revec” –js-flags="--experimental-wasm-revectorize --trace-wasm-revectorize" --no-sandbox --enable-logging

Or verify through the Tensorflow.js local benchmark: https://tensorflow.github.io/tfjs/e2e/benchmarks/local-benchmark/index.html Select backend as wasm, and models as MobileNetV2, you will likely see obvious speedup on the “Subseqent average(50 runs)” time.

Generate console log

To generate complete trace log, we need to add below build flag in “args.gn” and build Chromium manually:

is_debug = false

is_official_build = true

disable_fieldtrial_testing_config = true

win_console_app = true

Then run below command to launch chrome and redirect the output to a log file:

>chrome.exe –user-data-dir=”%TEMP%\revec” –js-flags="--experimental-wasm-revectorize --trace-wasm-revectorize --wasm-num-compilation-tasks=1" --no-sandbox --enable-logging > run.log 2>&1

How to generate revectorizable code

There are three methods to select for conveniency:

Compile the native AVX/AVX2 intrinsic code to wasm through Emscripten directly. We started to support 256-bit AVX/AVX2 intrinsics for wasm by PRs:
- Add 256-bit AVX intrinsic support
- Add 256-bit AVX2 intrinsic support
Using Highway C++ library for WASM_EMU256 (a 2x unrolled version of wasm128) or similar code pattern.
For general C/C++ compilation, make sure interleaved unrolling is enabled. In practice, passing below flags to the clang compiler:

-mllvm -force-vector-interleave=2 -mllvm --pre-RA-sched=source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!