simulate your RTL with real multi-threaded speed
interface different simulators and chiplets together
RTL simulations:
- are typically single threaded and don't scale well when your DUT size increases
- run in 1 simulation environment, which makes it hard to simulate multiple chips together
multisim is a systemverilog/DPI library allowing multiple simulations to run in parallel and communicate to simulate your DUT.
Typically, you can have:
- 1 server simulation with your DUT skeleton (NOC, fabric, etc)
- N client simulations with 1 big instance each (computing core, chip, etc)
Assuming your original simulation has N CPUs that take a lot of simulation time.
Into this one, running on N+1 simulation instances:

If the CPU is the bottleneck in terms of performance, you could speed up your simulation N times.
Reusing this example where we have:
- 1 server simulation with 1 NOC
CPU numberclient simulations with 1cpu(slow module) each
- core library (ready/valid protocol)
client->server: multisim_client_push and multisim_server_pullserver->client: multisim_server_push and multisim_client_pull
- other protocols:
- axi
- apb
- quasi static signals (useful for signals without control signals like IRQ)
See the following files from the example:
- server simulation and client simulations communicate through channels
- channels direction can be
client->serverorserver->client - each simulation can use mulitple channels
multisimmodules need a uniqueserver_nameto link a client/server channel together- client modules need to set
SERVER_RUNTIME_DIRECTORYto know the port/ip address of each channel
- server simulation can stop the simulation normally (
$finish, etc) - client simulations must not stop the simulation themselves
- they will automatically be stopped when the server simulation is done
- source env.sh
- pass the right files to your simulator:
See the example:
- simulation (server+clients) run script
- server simulation run script
- client simulation run script
Pros:
- speed: split your big DUT in as many smaller parts as you want
- interoperability: each server/client can use different simulators (Verilator, VCS, Questa, Xcelium, etc)
- scalability: as long as you have enough CPUs on your server
- cost: server CPUs are cheaper than emulation solution usually
- bringup time: super easy modules, simple interface (e.g.: AXI is 5 channels)
Cons:
- ⚠ no cycle accuracy ⚠: functionally accurate, but not cycle accurate
- harder debug: waveforms split on N+1 simulation, no time coherency in between them
- X/Z support
- self checking examples
- simple transaction logging to help debug

