基于改进三体训练法的半监督专利文本分类方法
|
胡云青,邱清盈,余秀,武建伟
|
Semi-supervised patent text classification method based on improved Tri-training algorithm
|
Yun-qing HU,Qing-ying QIU,Xiu YU,Jian-wei WU
|
|
表 1 专利数据集特征选择对比结果(试验1) |
Tab.1 Comparsion results of feature selection on patent dataset (Test 1) |
|
分类器 | F1 | Dim=150 | Dim=250 | Dim=350 | Dim=450 | Dim=550 | Dim=650 | Dim=750 | Dim=850 | Dim=950 | Xgboost | IG_New&Xgboost | 0.515 | 0.516 | 0.516 | 0.519 | 0.516 | 0.518 | 0.518 | 0.518 | 0.518 | IG&Xgboost | 0.469 | 0.471 | 0.471 | 0.480 | 0.473 | 0.474 | 0.475 | 0.474 | 0.475 | SVM | IG_New&SVM | 0.474 | 0.470 | 0.475 | 0.502 | 0.475 | 0.471 | 0.470 | 0.474 | 0.474 | IG&SVM | 0.430 | 0.432 | 0.432 | 0.450 | 0.441 | 0.439 | 0.430 | 0.432 | 0.432 | NB | IG_New&NB | 0.420 | 0.412 | 0.425 | 0.431 | 0.430 | 0.420 | 0.424 | 0.425 | 0.429 | IG&NB | 0.362 | 0.375 | 0.367 | 0.370 | 0.355 | 0.383 | 0.352 | 0.360 | 0.354 |
|
|
|