Bioinformatics Research Group at

Department of Computer Science

University of Nevada Reno

Home

Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly

Research Goals and Questions

 

1) Given a collection of nucleotide sequences, from multiple organisms, develop techniques based on fuzzy set theory and other methods (such as fuzzy hidden markov chains) for assembly of the sequences into the original full genome for each original organism

2) Using the techniques developed in goal one, develop a generalized approach for creating a characteristic genome that represents a generalization of the original organisms that donated sequence data.
To accomplish the above goals, the key to sequence matching, assembly and creation of lies in determination of a measure of similarity between two nucleotide sequences. Such a measure, if found can be utilized for assembling sequences into a characteristic genome and for reverse assembly of an organism from its constituent sub sequences. This research will seek to determine the following questions:

3) What does similarity means when one considers genetic sequences – specifically we will examine nucleotide data and classify it and characterize its attributes. This will utilize qualitative methods as a means of developing quantitative methods. The approach will be to use a meta data approach as ways of looking for commonalities in descriptions of properties between data. An issue to examine during this phase will be that small sequences will tend to have higher similarity with larger sequences. Therefore we will have to determine what length of a sequence is necessary to develop determine uniqueness in similarity matching.

4) How does one measure similarity – This research will build determine methods of quantitatively measuring similarity. We will examine the following techniques to determine the possibility of adapting them to sequence similarity measures, fuzzy C / K means, measures of spatial centrality, degree of difference in base pairs, metabolic meta classification, # of base pairs overlapping, and key marker (sequence) locations for spatial registration of sequences. The result of answering these questions will be the development of and morphology of fuzzy similarity functions and what would their accuracy be under various experimental conditions. Additionally variable factors relevant to similarity will be developed for use in the characteristic equations.

5) How is a generalized genome built – Creation of a characteristic gene from m sequences representing n individuals will present the problem that some sequences will have nucleotides that vary from other organisms at a particular location in a sequence. This can occur for example due to genetic modifications among other reasons. Therefore methods will be developed for determining what nucleotide to place into a characteristic genome when there is conflicting information. Decisions about how to accomplish this might be based on prevalence of surrounding nucleotides or other factors.

Links
Research
People
Defintions
Progress Reports
 

 

Literature and Publications

 

Run the Generator/Assembler