Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2019, Vol. 53 Issue (10): 2013-2023    DOI: 10.3785/j.issn.1008-973X.2019.10.019
Automation Technology, Computer Technology     
Android malicious behavior recognition and classification method based on random forest algorithm
Dong-xiang KE(),Li-min PAN*(),Sen-lin LUO,Han-qing ZHANG
Information System and Security Countermeasure Experimental Center, Beijing Institute of Technology, Beijing 100081, China
Download: HTML     PDF(693KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An Android malware behavior identification and classification method was proposed based on random forest (RF) algorithm aiming at the problem that the existing Android malware detection method cannot identify or classify the detected malicious behavior. The types of Android malware behavior were defined, and the potentially malicious behavior was triggered with a complex Android malicious behavior induction method. Application behavior can be captured by system function hook and transformed into behavior log. Then application behavioral feature set can be extracted from behavior log. The random forest algorithm was used to identify and classify the malicious behavior from the behavior log. The experimental results showed that proposed method had 91.6% accuracy in malware behavior identification and 96.8% accuracy in malicious behavior classification.



Key wordsAndroid security      machine learning      random forest (RF)      malware detection      malicious behavior classification     
Received: 15 November 2018      Published: 30 September 2019
CLC:  TP 399  
Corresponding Authors: Li-min PAN     E-mail: 384209891@qq.com;panlimin@bit.edu.cn
Cite this article:

Dong-xiang KE,Li-min PAN,Sen-lin LUO,Han-qing ZHANG. Android malicious behavior recognition and classification method based on random forest algorithm. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 2013-2023.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2019.10.019     OR     http://www.zjujournals.com/eng/Y2019/V53/I10/2013


基于随机森林算法的Android恶意行为识别与分类方法

针对当前Android恶意软件检测方法对检测出的恶意行为无法进行识别和分类的问题,提出基于随机森林(RF)算法的Android恶意行为的识别与分类方法. 该方法在对Android恶意软件的类型进行定义的基础上,利用融合多种触发机制的Android恶意行为诱导方法触发软件的潜在恶意行为;通过Hook关键系统函数对Android软件行为进行采集并生成行为日志,基于行为日志提取软件行为特征集;使用随机森林算法,对行为日志中的恶意行为进行识别与分类. 实验结果表明,该方法对Android恶意软件识别的准确率达到91.6%,对恶意行为分类的平均准确率达到96.8%.


关键词: Android安全,  机器学习,  随机森林(RF),  恶意软件检测,  恶意行为分类 
Fig.1 Android malicious behavior recognition and classification framework structure
Fig.2 Multi-level behavior data acquisition schematic
日志条目 说明
Date 软件调用该函数的时间
Package Name 被检测软件所对应的完整应用名称
Method Name 所调用函数的名称以及该函数所属包的包名
Arguments 调用函数时所传递的参数
Tab.1 Entry description of behavior log
Code:窃取用户地理位置信息
1:public void leakInfo(String url,String info){
2:  URL url=new URL(url);
3:  connection=url.openConnection();
4:  DataOutputStream wr=new DataOutputStream(connection.
5:  getOutputStream());
6:  wr.writeBytes(info);
7:  wr.flush();wr.close();connection.disconnect();
8:}
9:public void leakLocation(String url){
10:  locationProvider=LocationManager.GPS_PROVIDER;
11:  Location location=locationManager.getLastKnowLocation
12:  (locationProvider);
13:  String loc=location.getLatitude()+“:”+
14:  location.getLongitude();
15:  leakInfo(url,loc);
16:}
Tab.2 Geographic information stealing code
Fig.3 API sequence for privacy stealing
监控函数列表 备注
ContentResolver.query 根据参数不同可用于获取短信,通讯录,照片等隐私
LocationManager.getProvider 用于获取地理位置信息
Location.getLatitude
Location.getLongitude
PackageManager.getInstalledPackages 获取已安装应用
PackageManager.getInstalledApplications
TelePhonyManager.getSubscriberId 获取手机IMSI号码
TelePhonyManager.getDeviceId 获取手机设备号
SmsManager.sendTextMessage 可用于将窃取的隐私通过短信发送出去
DefaultHttpClient.execute 可用于将窃取的隐私通过网络发送
URL.openConnection
AbstractHttpClient.execute
Tab.3 List of apis related to privacy theft
恶意行为 监控函数列表 备注
恶意扣费 SmsManager.sendTextMessage 可用于发送sp短信,订购附加服务
BroadcastReceiver.abortBroadcast
ContentResolve.delete
资费消耗 DefaultHttpClient.execute 通过网络接收数据,消耗流量
AbstractHttpClient.execute
Socket.getInputStream
Socket.getOutputStream
URL.openConnection
OutputStream.write 用于保存从网络接受的数据
InputStream.read 读取从网络接收的数据
隐私窃取 ContentResolver.query 根据参数不同,可用于获取短信,通讯录,照片等隐私
LocationManager.getProvider 用于获取地理位置信息
Location.getLatitude
Location.getLongitude
PackageManager.getInstalledPackages 获取已安装应用
PackageManager.getInstalledApplications
TelePhonyManager.getSubscriberId 获取手机IMSI号码
TelePhonyManager.getDeviceId 获取手机设备号
SmsManager.sendTextMessage 可用于将窃取的隐私通过短信发送出去
DefaultHttpClient.execute 可用于将窃取的隐私通过网络发送
URL.openConnection
AbstractHttpClient.execute
流氓行为 Runtime.exec("su") 可用于获取root权限
DevicePolicyManager.isAdminActive 获取设备管理员权限
ApplicationPackageManager.setComponentEnabledSetting 隐藏应用图标
ApplicationPackageManager.installPackage 静默安装
ShortcutIconResource.fromContext 创建快捷方式
Dialog.onCreate 可用于广告弹窗
java.lang.Runtime.exec("mount") 将应用设置为系统应用
java.lang.Runtime.exec("cp")
java.lang.Runtime.exec("chmod")
系统破坏 Runtime.exec("su") 可用于获取root权限
DevicePolicyManager.isAdminActive 用于获取设备管理员权限
ActivityManager.getRunningAppProcesses 用于查看现有进程信息
ActivityManager.killBackgroundProcesses 用于终止其他进程
ActivityManager.forceStopPackage 用于终止其他应用
ApplicationPackageManager.deletePackage 用于卸载其他应用
File.delete 用于删除用户文件
Cipher.getInstance 用于加密用户文件
MessageDigest.getInstance
ApplicationPackageManager.setComponentEnabledSetting 可用于终止其他组件
android.app.admin.DevicePolicyManager.resetPassword 可用于修改锁屏密码,并锁屏
android.app.admin.DevicePolicyManager.lockNow
权限提升 Runtime.exec("su") 可用于获取root权限
DevicePolicyManager.isAdminActive 用于获取设备管理员权限
mmap 通过脏牛、Futex、zergRush等漏洞进行权限提升攻击时所需使用的native函数
madvise
malloc
phtread_create
getgid
futex_lock_pi
futex_lock_pi_atomic
mount
fopen("/proc/mounts","r")
setresuid 设置文件的S权限位
 
Fig.4 Malicious behavior detection process
项目 操作系统 CPU型号
华为荣耀4A手机 Android5.1 四核 1.1 GHz
Dell 390台式机 windows7 i7-6700 CPU 3.4 GHz
Tab.4 Malware recognition experiment environment
实验方法 PD/% PF/%
Avira AntiVirus 92.7 0.6
本方法 91.6 0.7
Tab.5 Malware recognition experiment results
恶意行为 Nps Nns
恶意扣费 763 562
资费消耗 635 623
隐私窃取 1 203 1 106
流氓行为 351 423
系统破坏 576 524
权限提升 241 324
Tab.6 Number of malicious behavior samples
项目 操作系统 CPU型号
华为荣耀4A手机 Android5.1 四核 1.1 GHz
Dell 390台式机 windows7 i7-6700 CPU 3.4 GHz
Tab.7 Malicious behavior classification experiment environment
真实值 良性行为 恶意行为
良性行为 TN FP
恶意行为 FN TP
Tab.8 Malicious behavior classification confusion matrix
恶意行为 P R F Ns
恶意扣费 0.96 0.98 0.98 1 325
资费消耗 0.98 0.99 0.97 1 258
隐私窃取 0.99 0.98 0.98 2 309
流氓行为 0.95 0.97 0.94 774
系统破坏 0.94 0.96 0.95 1 100
权限提升 0.99 0.97 0.96 565
Tab.9 Malicious behavior classification experiment results
[1]   EGHAM. Gartner says worldwide sales of smartphones recorded 1st ever decline during the 4th quarter of 2017 [EB/OL].[2018-05-01]. https://www.gartnner.com/newsroom/id/3859963.
[2]   FENG Y, ANAND S, DILLIG I, et al. Apposcopy: semantics-based detection of Android malware through static analysis [C] // ACM Sigsoft International Symposium on Foundations of Software Engineering. Hong Kong: ACM, 2014: 576-587.
[3]   MOUHEB D, MOUHEB D, MOUHEB D, et al. Cypider: building community-based cyber-defense infrastructure for android malware detection [C] // Conference on Computer Security Applications. Atlanta: ACM, 2016: 348-362.
[4]   FELDMAN S, STADTHER D, WANG B. Manilyzer: automated Android malware detection through manifest analysis [C] // IEEE International Conference on Mobile Ad Hoc and Sensor Systems. Dallas: IEEE, 2015: 767-772.
[5]   LI J, SUN L, YAN Q, et al Significant permission identification for machine-learning-based Android malware detection[J]. IEEE Transactions on Industrial Informatics, 2018, 14 (7): 3216- 3225
doi: 10.1109/TII.2017.2789219
[6]   TALHA K A, ALPER D I, AYDIN C APK auditor: permission-based Android malware detection system[J]. Digital Investigation, 2015, 13 (10): 1- 14
[7]   SUN L, LI Z, YAN Q, et al. SigPID: significant permission identification for android malware detection [C] // International Conference on Malicious and Unwanted Software. Fajardo: IEEE, 2017: 1-8.
[8]   MASSARELLI L, ANIELLO L, CICCOTELLI C, et al. Android malware family classification based on resource consumption over time [C] // International Conference on Malicious and Unwanted Software. Fajardo: IEEE, 2017: 31-38.
[9]   MALIK J, KAUSHAL R. CREDROID: Android malware detection by network traffic analysis [C] // ACM Workshop on Privacy-Aware Mobile Computing. Paderborn: ACM, 2016: 28-36.
[10]   ZULKIFLI A, HAMID I R A, SHAH W M, et al. Android malware detection based on network traffic using decision tree algorithm [C] // International Conference on Soft Computing and Data Mining. Cham: Springer, 2018: 485-494.
[11]   SUN Y S, CHEN C C, HSIAO S W, et al. ANTSdroid: automatic malware family behaviour generation and analysis for Android apps [C] // Australasian Conference on Information Security and Privacy. Cham: Springer, 2018: 796-804.
[12]   HUANG J, ZHANG X, TAN L, et al. AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction [C] // International Conference on Software Engineering. Zurich: ACM, 2014: 1036-1046.
[13]   DAMOPOULOS D, KAMBOURAKIS G, PORTOKALIDIS G. The best of both worlds: a framework for the synergistic operation of host and cloud anomaly-based IDS for smartphones [C] // European Workshop on System Security. Amsterdam: ACM, 2014: 6.
[14]   ENCK W, GILBERT P, CHUN B G, et al. TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones [C] // Usenix Conference on Operating Systems Design and Implementation. Broomfield: ACM, 2014: 393-407.
[15]   ZHANG Y, YANG M, XU B, et al. Vetting undesirable behaviors in android apps with permission use analysis [C] // ACM Sigsac Conference on Computer and Communications Security. Berlin: ACM, 2013: 611-622.
[16]   中国反病毒联盟. 移动互联网恶意程序描述格式[EB/OL].[2018-05-01]. https://white.anva.org.cn/rel/file/ydwj.pdf.
[1] You ZHAN,Qiang LI,Xiao-tian MA,Chen-ping WANG,Yan-jun QIU. Macro and micro texture based prediction of pavement surface friction[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 684-694.
[2] Yong YU,Jing-yuan XUE,Sheng DAI,Qiang-wei BAO,Gang ZHAO. Quality prediction and process parameter optimization method for machining parts[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(3): 441-447.
[3] Qiao-hong CHEN,YI CHEN,Wen-shu Li,Yu-bo JIA. Clothing image classification based on multi-scale SE-Xception[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1727-1735.
[4] Hui-fang WANG,Chen-yu ZHANG. Prediction of voltage stability margin in power system based on extreme gradient boosting algorithm[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 606-613.
[5] Le XIE,Xi-dan HENG,Yang LIU,Qi-long JIANG,Dong LIU. Transformer fault diagnosis based on linear discriminant analysis and step-by-step machine learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(11): 2266-2272.
[6] Zhi-yuan WAN,Jia-heng TAO,Jia-kun LIANG,Zhen-gong CAI,Cheng CHANG,Lin QIAO,Qiao-ni ZHOU. Large-scale empirical study on machine learning related questions on Stack Overflow[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(5): 819-828.
[7] Jiang-kuan XING,Hai-ou WANG,Kun LUO,Yun BAI,Jian-ren FAN. Random forest model for predicting kinetic parameters of biomass devolatilization[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 605-612.
[8] HU Li-sha, WANG Su-zhen, CHEN Yi-qiang, GAO Chen-long, HU Chun-yu, JIANG Xin-long, CHEN Zhen-yu, GAO Xing-yu. Fall detection algorithms based on wearable device: a review[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1717-1728.
[9] WANG Hong-kai, CHEN Zhong-hua, ZHOU Zong-wei, LI Ying-ci, LU Pei-ou, WANG Wen-zhi, LIU Wan-yu, YU Li-juan. Evaluation of machine learning classifiers for diagnosing mediastinal lymph node metastasis of lung cancer from PET/CT images[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(4): 788-797.
[10] WU Peng-zhou, YU Hui-min, ZENG Xiong. Object counting based on regularized risk minimization[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(7): 1226-1233.