基座模型技术背景下的具身智能体综述
|
李颂元,朱祥维,李玺
|
Survey of embodied agent in context of foundation model
|
Songyuan LI,Xiangwei ZHU,Xi LI
|
|
表 2 大型多模态模型 |
Tab.2 Large multimodal model |
|
大型多模态模型 | 视觉 | 语言 | 本体 | 动作 | 参数量 | 图像-文本对数量 | 轨迹量 | ViLBERT[64] | ✓ | ✓ | — | — | 1.55×108 | 3.1×106 | — | UNITER[65] | ✓ | ✓ | — | — | 8.6×107/3.03×108 | 9.6×106 | — | Oscar[66] | ✓ | ✓ | — | — | 1.10×108/3.40×108 | 6.5×106 | — | CLIP[5] | ✓ | ✓ | — | — | 3.70×108 | 4.00×108 | — | ALIGN[28] | ✓ | ✓ | — | — | 7.90×108 | 1.8×109 | — | BASIC[67] | ✓ | ✓ | — | — | 3×109 | 6.6×109 | — | PaLI[29] | ✓ | ✓ | — | — | 1.7×1010 | 1×109 | — | PaLI-X[30] | ✓ | ✓ | — | — | 5.5×1010 | — | — | Gato[54] | ✓ | ✓ | ✓ | ✓ | 1.2×109 | 2.1×109 | 6.3×107 | RPT[63] | ✓ | — | ✓ | ✓ | 3.08×108 | — | 2.0×104 | RoboCat[57] | ✓ | — | ✓ | ✓ | 1.18×109 | — | 2.8×106 |
|
|
|