Please wait a minute...
Applied Mathematics-A Journal of Chinese Universities  2021, Vol. 36 Issue (1): 114-127    
    
Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector
LIU Wen-li1,2; WU Qing-biao1
1Department of Mathematics, Zhejiang University, Hangzhou 310027, China.
2Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University,Zijingang Campus, Hangzhou 310012, China.
Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector
LIU Wen-li1,2; WU Qing-biao1
1Department of Mathematics, Zhejiang University, Hangzhou 310027, China.
2Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University,Zijingang Campus, Hangzhou 310012, China.
 全文: PDF 
摘要: K-mer can be used for the description of biological sequences and k-mer distribution
is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as
a representation method of the k-mer distribution of the biological sequence. Problems, such as
similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps
us to identify new features of an old sequence-based problem in bioinformatics and develop new
algorithms using the concepts and methods from linear space theory. In this study, we defined
the k-mer vector space for the generalized biological sequences. The meaning of corresponding
vector operations is explained in the biological context. We presented the vector/matrix form of
several widely seen sequence-based problems, including read quantification, sequence assembly,
and pattern detection problem. Its advantages and disadvantages are discussed. Also, we
implement a tool for the sequence assembly problem based on the concepts of k-mer vector
methods. It shows the practicability and convenience of this algorithm design strategy.
关键词: vector space biological sequence k-mer algorithm design analysis method    
Abstract: K-mer can be used for the description of biological sequences and k-mer distribution
is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as
a representation method of the k-mer distribution of the biological sequence. Problems, such as
similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps
us to identify new features of an old sequence-based problem in bioinformatics and develop new
algorithms using the concepts and methods from linear space theory. In this study, we defined
the k-mer vector space for the generalized biological sequences. The meaning of corresponding
vector operations is explained in the biological context. We presented the vector/matrix form of
several widely seen sequence-based problems, including read quantification, sequence assembly,
and pattern detection problem. Its advantages and disadvantages are discussed. Also, we
implement a tool for the sequence assembly problem based on the concepts of k-mer vector
methods. It shows the practicability and convenience of this algorithm design strategy.
Key words: vector space    biological sequence    k-mer    algorithm design    analysis method
出版日期: 2021-03-19
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
LIU Wen-li
WU Qing-biao

引用本文:

LIU Wen-li, WU Qing-biao. Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector[J]. Applied Mathematics-A Journal of Chinese Universities, 2021, 36(1): 114-127.

LIU Wen-li, WU Qing-biao. Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector. Applied Mathematics-A Journal of Chinese Universities, 2021, 36(1): 114-127.

链接本文:

http://www.zjujournals.com/amjcub/CN/        http://www.zjujournals.com/amjcub/CN/Y2021/V36/I1/114

No related articles found!