(中南工業(yè)大學(xué)信息工程學(xué)院,長沙 410083)
摘 要: 提出了一種新的基因組數(shù)據(jù)模型和模式發(fā)現(xiàn)算法。該模型由人工基因組、人工蛋白、進(jìn)化操作、 進(jìn)化控制、模式匹配、終止判斷6個(gè)環(huán)節(jié)組成,其中抽象代數(shù)結(jié)構(gòu)由格集合構(gòu)形和相應(yīng)有限狀態(tài)機(jī)操作來動(dòng)態(tài)描述,候選符號序列由符號動(dòng)力學(xué)引導(dǎo)的進(jìn)化算法所生成,進(jìn)化程度由粗糙集所刻劃的元進(jìn)化機(jī)制所控制,模式匹配由句法模式識別器和文法推斷過程所完成,終止判斷依具體問題求解的約束條件而定。相應(yīng)的算法為循環(huán)性的群體隱式并行搜索,數(shù)據(jù)結(jié)構(gòu)以答號化粗粒度的處理為主,并與面向語義的模塊化程序設(shè)計(jì)相配合。在該人工生命技術(shù)的應(yīng)用中,由計(jì)算機(jī)自動(dòng)生成了候選符號序列,從中獲得了“真實(shí)” 的氨基酸序列。實(shí)驗(yàn)結(jié)果表明,所提出并實(shí)現(xiàn)的計(jì)算方法有助于基因組學(xué)層次下的生物信息學(xué)的統(tǒng)一計(jì)算理論的建立和應(yīng)用系統(tǒng)開發(fā)。
關(guān)鍵字: 基因組學(xué) 生物信息學(xué) 進(jìn)化計(jì)算
(College of Information Engineering,Central South University of Technology, Changsha 410083, P. R. China)
Abstract:A novel model of genomic data mining and a corresponding algorithm for pattern discovery were proposed . The model consists of six units such as artificial genome, artificial proteome, evolutionary operation, evolutionary control, pattern matching and termination judgement. The abstract algebraic structure is described by lattice set configuration and finite state automata dynamically. The candidate string sequence is generated by evolutionary algorithm with symbolic dynamics. The degree of evolution is controlled by meta-evolution mechanism and expressed by rough sets. The pattern matching procedure is implemented by syntatic pattern recognizer and grammar inference. Termination judgement is dependent on concret problem solving paradigm. The algorithm is with the cycle type of implicit parallation and population searching . The data structure focusses on coarse-grained symbolic information processing and modular programming oriented to semantics. With the application of the above-mentioned artificial life techniques, candidate symbolic sequences have been automatically produced by computer system and “ real” amino-acid sequence obtained among them. The experimental result shows that the computational method proposed and implemented here is helpful to the building of unified computational theory of bioinformatics in the genomics level and development of application systems.
Key words: genomics bioinformatics evolutionary computation


