Skip to content

Commit 2f3e1c0

Browse files
committed
[feat] Add training code for InternVLA-N1
1 parent f74a79b commit 2f3e1c0

File tree

23 files changed

+4318
-680
lines changed

23 files changed

+4318
-680
lines changed

README.md

Lines changed: 99 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ The toolbox supports the most comprehensive 6 datasets \& benchmarks and 10+ pop
3434
The toolbox supports the most advanced high-quality navigation dataset, InternData-N1, which includes 3k+ scenes and 830k VLN data covering diverse embodiments and scenes, and the first dual-system navigation foundation model with leading performance on all the benchmarks and zero-shot generalization capability in the real world, InternVLA-N1.
3535

3636
## 🔥 News
37+
- [2025/12] Training code for InternVLA-N1 is now available. This official release provides two dual-system configurations: **InternVLA-N1 (Dual System)**<span style="color: #28a745; font-size: 0.9em"> with NavDP*</span> and **InternVLA-N1 (Dual System)**<span style="color: #28a745; font-size: 0.9em"> DualVLN</span>. For model architecture and training details, please refer to the [DualVLN paper](TO_BE_UPDATED).
3738
- [2025/10] Add a simple [inference-only demo](scripts/notebooks/inference_only_demo.ipynb) of InternVLA-N1.
3839
- [2025/10] InternVLA-N1 [technical report](https://internrobotics.github.io/internvla-n1.github.io/static/pdfs/InternVLA_N1.pdf) is released. Please check our [homepage](https://internrobotics.github.io/internvla-n1.github.io/).
3940
- [2025/09] Real-world deployment code of InternVLA-N1 is released. Upload 3D printing [files](assets/3d_printing_files/go2_stand.STEP) for Unitree Go2.
@@ -55,133 +56,147 @@ The toolbox supports the most advanced high-quality navigation dataset, InternDa
5556

5657
Please refer to the [documentation](https://internrobotics.github.io/user_guide/internnav/quick_start/index.html) for quick start with InternNav, from installation to training or evaluating supported models.
5758

58-
## 📦 Overview of Benchmark and Model Zoo
59+
## 📦 Overview
5960

60-
### Datasets \& Benchmarks
61+
### 🧪 Supported Benchmarks
6162

6263
<table align="center">
6364
<tbody>
6465
<tr align="center" valign="bottom">
6566
<td>
66-
<b>System2 (VLN-CE)</b>
67+
<b>VLN Benchmarks</b>
6768
</td>
6869
<td>
69-
<b>System1 (VN)</b>
70-
</td>
71-
<td>
72-
<b>Whole-system (VLN)</b>
70+
<b>VN Benchmarks</b>
7371
</td>
7472
</tr>
7573
<tr align="center" valign="top">
7674
<td>
7775
<ul>
78-
<li align="left"><a href="">VLN-CE R2R</a></li>
79-
<li align="left"><a href="">VLN-CE RxR</a></li>
76+
<li align="left"><a href="https://arxiv.org/abs/2004.02857">VLN-CE</a></li>
77+
<li align="left"><a href="https://arxiv.org/abs/2507.13019">VLN-PE</a></li>
8078
</ul>
8179
</td>
8280
<td>
8381
<ul>
84-
<li align="left"><a href="">Cluttered Envs</a></li>
85-
<li align="left"><a href="">GRScenes-100</a></li>
86-
</ul>
87-
</td>
88-
<td>
89-
<ul>
90-
<li align="left"><a href="">VLN-CE</a></li>
91-
<li align="left"><a href="">VLN-PE</a></li>
82+
<li align="left"><a href="https://arxiv.org/abs/2505.08712">Cluttered Environments</a></li>
83+
<li align="left"><a href="https://arxiv.org/abs/2505.08712">GRScenes-100</a></li>
9284
</ul>
9385
</td>
9486
</tbody>
9587
</table>
9688

97-
### Models
89+
### 🤗 Model Zoo & Downloads
9890

9991
<table align="center">
10092
<tbody>
10193
<tr align="center" valign="bottom">
10294
<td>
103-
<b>System2 (VLN-CE)</b>
95+
<b>🧠 VLN Single-System</b>
10496
</td>
10597
<td>
106-
<b>System1 (VN)</b>
98+
<b>🎯 VN System (System1)</b>
10799
</td>
108100
<td>
109-
<b>Whole-system (VLN)</b>
101+
<b>🤝 VLN Multi-System</b>
110102
</td>
111103
</tr>
112104
<tr align="center" valign="top">
113105
<td>
114106
<ul>
115-
<li align="left"><a href="">StreamVLN</a></li>
116-
<li align="left"><a href="">InternVLA-N1-Preview (S2)</a></li>
117-
<li align="left"><a href="">InternVLA-N1 (S2)</a></li>
107+
<li align="left"><a href="https://huggingface.co/InternRobotics/VLN-PE">Seq2Seq</a></li>
108+
<li align="left"><a href="https://huggingface.co/InternRobotics/VLN-PE">CMA</a></li>
109+
<li align="left"><a href="https://huggingface.co/InternRobotics/VLN-PE">RDP</a></li>
110+
<li align="left"><a href="https://github.com/InternRobotics/StreamVLN">StreamVLN</a> <em>(coming soon)</em></li>
118111
</ul>
119112
</td>
120113
<td>
121114
<ul>
122-
<li align="left"><a href="">DD-PPO</a></li>
123-
<li align="left"><a href="">iPlanner</a></li>
124-
<li align="left"><a href="">ViPlanner</a></li>
125-
<li align="left"><a href="">GNM</a></li>
126-
<li align="left"><a href="">ViNT</a></li>
127-
<li align="left"><a href="">NoMad</a></li>
128-
<li align="left"><a href="">NavDP</a></li>
115+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">DD-PPO</a></li>
116+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">iPlanner</a></li>
117+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">ViPlanner</a></li>
118+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">GNM</a></li>
119+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">ViNT</a></li>
120+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">NoMad</a></li>
121+
<li align="left"><a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library">NavDP <small>InternVLA-N1 (System 1)</small></a></li>
129122
</ul>
130123
</td>
131124
<td>
132125
<ul>
133-
<li align="left"><a href="">Seq2Seq</a></li>
134-
<li align="left"><a href="">CMA</a></li>
135-
<li align="left"><a href="">RDP</a></li>
136-
<li align="left"><a href="">InternVLA-N1-Preview</a></li>
137-
<li align="left"><a href="">InternVLA-N1</a></li>
126+
<li align="left"><a href="https://huggingface.co/InternRobotics/InternVLA-N1-System2">InternVLA-N1 (System 2)</a> + <a href="https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library" style="color: #1e90ff;">Decoupled System1</a></li>
127+
<li align="left"><a href="https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP">InternVLA-N1 (Dual System) <small>w/ NavDP*</small> </a> <small> (NavDP*</small> indicates joint tuning with System 2)</li>
128+
<li align="left"><a href="https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN">InternVLA-N1 (Dual System) <small>DualVLN</small></a></li>
138129
</ul>
139130
</td>
140131
</tbody>
141132
</table>
142133

143-
### Benchmark Results
144-
145-
#### VLN-CE Task
146-
| Model | Dataset/Benchmark | NE | OS | SR | SPL | Download |
147-
| ------ | ----------------- | -- | -- | --------- | -- | --------- |
148-
| `InternVLA-N1 (S2)` | R2R | 4.89 | 60.6 | 55.4 | 52.1| [Model](https://huggingface.co/InternRobotics/InternVLA-N1-S2) |
149-
| `InternVLA-N1` | R2R | **4.83** | **63.3** | **58.2** | **54.0** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
150-
| `InternVLA-N1 (S2)` | RxR | 6.67 | 56.5 | 48.6 | 42.6 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-S2) |
151-
| `InternVLA-N1` | RxR | **5.91** | **60.8** | **53.5** | **46.1** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
152-
| `InternVLA-N1-Preview (S2)` | R2R | 5.09 | 60.9 | 53.7 | 49.7 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview-S2) |
153-
| `InternVLA-N1-Preview` | R2R | **4.76** | **63.4** | **56.7** | **52.6** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
154-
| `InternVLA-N1-Preview (S2)` | RxR | 6.39 | 60.1 | 50.5 | 43.3 | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview-S2) |
155-
| `InternVLA-N1-Preview` | RxR | **5.65** | **63.2** | **53.5** | **45.7** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
156-
157-
#### VLN-PE Task
158-
| Model | Dataset/Benchmark | NE | OS | SR | SPL | Download |
159-
| ------ | ----------------- | -- | -- | -- | --- | --- |
160-
| `Seq2Seq` | Flash | 8.27 | 43.0 | 15.7 | 9.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
161-
| `CMA` | Flash | 7.52 | 45.0 | 24.4 | 18.2 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
162-
| `RDP` | Flash | 6.98 | 42.5 | 24.9 | 17.5 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
163-
| `InternVLA-N1-Preview` | Flash | **4.21** | **68.0** | **59.8** | **54.0** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
164-
| `InternVLA-N1` | Flash | **4.13** | **67.6** | **60.4** | **54.9** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
165-
| `Seq2Seq` | Physical | 7.88 | 28.1 | 15.1 | 10.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
166-
| `CMA` | Physical | 7.26 | 31.4 | 22.1 | 18.6 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
167-
| `RDP` | Physical | 6.72 | 36.9 | 25.2 | 17.7 | [Model](https://huggingface.co/InternRobotics/VLN-PE) |
168-
| `InternVLA-N1-Preview` | Physical | **5.31** | **49.0** | **42.6** | **35.8** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1-Preview) |
169-
| `InternVLA-N1` | Physical | **4.73** | **56.7** | **50.6** | **43.3** | [Model](https://huggingface.co/InternRobotics/InternVLA-N1) |
170-
171-
#### Visual Navigation Task - PointGoal Navigation
172-
| Model | Dataset/Benchmark | SR | SPL | Download |
173-
| ------ | ----------------- | -- | -- | --------- |
174-
| `iPlanner` | ClutteredEnv | 84.8 | 83.6 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
175-
| `ViPlanner` | ClutteredEnv | 72.4 | 72.3 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
176-
| `InternVLA-N1 (S1)` | ClutteredEnv | **89.8** | **87.7** | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
177-
| `iPlanner` | InternScenes | 48.8 | 46.7 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
178-
| `ViPlanner` | InternScenes | 54.3 | 52.5 | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
179-
| `InternVLA-N1 (S1)` | InternScenes | **65.7** | **60.7** | [Model](https://github.com/InternRobotics/NavDP?tab=readme-ov-file#%EF%B8%8F-installation-of-baseline-library) |
180-
181-
182-
183-
**NOTE:**
184-
- VLN-CE RxR benchmark and StreamVLN will be supported soon.
134+
<!-- **📝 Note:**
135+
- VLN-CE RxR benchmark and StreamVLN model will be supported soon.
136+
- **NE**: Navigation Error (lower is better) • **OS**: Oracle Success (higher is better) • **SR**: Success Rate (higher is better) • **SPL**: Success weighted by Path Length (higher is better) -->
137+
138+
139+
### 📊 Benchmark Results
140+
141+
142+
#### <u>VLN-CE Benchmarks</u>
143+
144+
**📍 R2R Dataset**
145+
| Model | Observation | NE ↓ | OS ↑ | SR ↑ | SPL ↑ |
146+
|-------|-------------|------|------|------|-------|
147+
| InternVLA-N1-wo-dagger (S2) + [ShortestPathFollower](https://aihabitat.org/docs/habitat-lab/habitat.tasks.nav.shortest_path_follower.ShortestPathFollower.html) | - | 4.89 | 60.6 | 55.4 | 52.1 |
148+
| InternVLA-N1-wo-dagger (Dual System) <span style="color: #28a745; font-size: 0.9em"> with NavDP*</span> | RGB-D | 4.83 | 63.3 | 58.2 | 54.0 |
149+
| InternVLA-N1 (S2) + [ShortestPathFollower](https://aihabitat.org/docs/habitat-lab/habitat.tasks.nav.shortest_path_follower.ShortestPathFollower.html) | - | 4.25 | 68.3 | 60.9 | 55.2 |
150+
| InternVLA-N1 (Dual System)<span style="color: #28a745; font-size: 0.9em"> with NavDP*</span> | RGB-D | 4.22 | 70.4 | 64.1 | 58.1 |
151+
| InternVLA-N1 (Dual System)<span style="color: #28a745; font-size: 0.9em"> DualVLN </span> | RGB | **4.05** | **70.7** | **64.3** | **58.5** |
152+
153+
**📍 RxR Dataset**
154+
| Model | Observation | NE ↓ | SR ↑ | SPL ↑ | nDTW ↑ |
155+
|-------|-------------|------|------|------|-------|
156+
| InternVLA-N1 (S2) + [ShortestPathFollower](https://aihabitat.org/docs/habitat-lab/habitat.tasks.nav.shortest_path_follower.ShortestPathFollower.html) | - | 5.71 | 63.5 | 55.0 | 46.8 |
157+
| InternVLA-N1 (Dual System)<span style="color: #28a745; font-size: 0.9em"> with NavDP*</span> | RGB-D | 4.70 | 59.7 | 50.6 | 69.7 |
158+
| InternVLA-N1 (Dual System)<span style="color: #28a745; font-size: 0.9em"> DualVLN </span> | RGB | **4.58** | **61.4** | **51.8** | **70.0** |
159+
160+
---
161+
162+
#### <u>VLN-PE Benchmarks</u>
163+
164+
**📍 Flash Controller on R2R Unseen**
165+
| Model | NE ↓ | OS ↑ | SR ↑ | SPL ↑ |
166+
|-------|------|------|------|-------|
167+
| Seq2Seq | 8.27 | 43.0 | 15.7 | 9.7 |
168+
| CMA | 7.52 | 45.0 | 24.4 | 18.2 |
169+
| RDP | 6.98 | 42.5 | 24.9 | 17.5 |
170+
| InternVLA-N1 (System 2) + iPlanner | 4.91 | 55.53 | 47.07 | 41.09 |
171+
| InternVLA-N1 (System 2) + NavDP | 4.22 | 67.33 | 58.72 | 50.98 |
172+
| InternVLA-N1 (Dual System)<span style="color: #28a745; font-size: 0.9em"> DualVLN </span> | **3.90** | **69.93** | **63.62** | **56.49** |
173+
174+
**📍 Physical Controller on R2R Unseen**
175+
| Model | NE ↓ | OS ↑ | SR ↑ | SPL ↑ |
176+
|-------|------|------|------|-------|
177+
| Seq2Seq | 7.88 | 28.1 | 15.1 | 10.7 |
178+
| CMA | 7.26 | 31.4 | 22.1 | 18.6 |
179+
| RDP | 6.72 | 36.9 | 25.2 | 17.7 |
180+
| InternVLA-N1 (Dual System)<span style="color: #28a745; font-size: 0.9em"> DualVLN </span> | **4.66** | **55.9** | **51.6** | **42.49** |
181+
182+
183+
#### <u>Visual Navigation Benchmarks</u>
184+
185+
**📍 ClutteredEnv Dataset**
186+
| Model | SR ↑ | SPL ↑ |
187+
|-------|------|-------|
188+
| iPlanner | 84.8 | 83.6 |
189+
| ViPlanner | 72.4 | 72.3 |
190+
| NavDP <InternVLA-N1 (System 1)> | **89.8** | **87.7** |
191+
192+
**📍 InternScenes Dataset**
193+
| Model | SR ↑ | SPL ↑ |
194+
|-------|------|-------|
195+
| iPlanner | 48.8 | 46.7 |
196+
| ViPlanner | 54.3 | 52.5 |
197+
| NavDP <InternVLA-N1 (System 1)> | **65.7** | **60.7** |
198+
199+
---
185200

186201
## 🔧 Customization
187202

@@ -236,6 +251,12 @@ If you use the specific pretrained models and benchmarks, please kindly cite the
236251
year = {2025},
237252
booktitle={arXiv},
238253
}
254+
@misc{dualvln,
255+
title = {{InternVLA-N1: An} Open Dual-System Navigation Foundation Model with Learned Latent Plans},
256+
author = {InternVLA-N1 Team},
257+
year = {2025},
258+
booktitle={arXiv},
259+
}
239260
```
240261

241262
</details>

0 commit comments

Comments
 (0)