Please wait a minute...
Applied Mathematics-A Journal of Chinese Universities  2021, Vol. 36 Issue (1): 114-127    DOI:
    
Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector
LIU Wen-li1,2; WU Qing-biao1
1Department of Mathematics, Zhejiang University, Hangzhou 310027, China.
2Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University,Zijingang Campus, Hangzhou 310012, China.
Download: PDF 
Export: BibTeX | EndNote (RIS)      

Abstract  K-mer can be used for the description of biological sequences and k-mer distribution
is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as
a representation method of the k-mer distribution of the biological sequence. Problems, such as
similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps
us to identify new features of an old sequence-based problem in bioinformatics and develop new
algorithms using the concepts and methods from linear space theory. In this study, we defined
the k-mer vector space for the generalized biological sequences. The meaning of corresponding
vector operations is explained in the biological context. We presented the vector/matrix form of
several widely seen sequence-based problems, including read quantification, sequence assembly,
and pattern detection problem. Its advantages and disadvantages are discussed. Also, we
implement a tool for the sequence assembly problem based on the concepts of k-mer vector
methods. It shows the practicability and convenience of this algorithm design strategy.


Key wordsvector space      biological sequence      k-mer      algorithm design      analysis method     
Published: 19 March 2021
Cite this article:

LIU Wen-li, WU Qing-biao. Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector. Applied Mathematics-A Journal of Chinese Universities, 2021, 36(1): 114-127.

URL:

http://www.zjujournals.com/amjcub/     OR     http://www.zjujournals.com/amjcub/Y2021/V36/I1/114


Analysis method and algorithm design of biological sequence problem based on generalized k-mer vector

K-mer can be used for the description of biological sequences and k-mer distribution
is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as
a representation method of the k-mer distribution of the biological sequence. Problems, such as
similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps
us to identify new features of an old sequence-based problem in bioinformatics and develop new
algorithms using the concepts and methods from linear space theory. In this study, we defined
the k-mer vector space for the generalized biological sequences. The meaning of corresponding
vector operations is explained in the biological context. We presented the vector/matrix form of
several widely seen sequence-based problems, including read quantification, sequence assembly,
and pattern detection problem. Its advantages and disadvantages are discussed. Also, we
implement a tool for the sequence assembly problem based on the concepts of k-mer vector
methods. It shows the practicability and convenience of this algorithm design strategy.

关键词: vector space,  biological sequence,  k-mer,  algorithm design,  analysis method 
No related articles found!