基于Transformer的多模态级联文档布局分析网络
温绍杰,吴瑞刚,冯超文,刘英莉

Multimodal cascaded document layout analysis network based on Transformer
Shaojie WEN,Ruigang WU,Chaowen FENG,Yingli LIU
表 3 所提模型与现有模型在PublayNet数据集上对各类元素识别的mAP值
Tab.3 mAP values of proposed model and existing models for identification of various elements in PublayNet dataset
模型mAP/%
文本标题列表表格图片
PublayNet[24]91.684.088.696.094.9
DiT[13](Mask R-CNN)92.884.586.897.596.5
DiT[13](Cascader R-CNN)93.685.989.797.696.9
UDoc[25]92.686.592.496.595.4
BEiT[23]92.586.293.197.395.7
MCOD-Net94.490.595.497.897.0