CSE GEC WORLD: Bioinformatics

Humans have been carrying unwanted viral gene segments since many years and reports suggests that approximately 3-8 % of the human genome has been comprised of viral DNA. In this point of view, various viral sequences were downloaded from NCBI Tax Browser and scanned against complete genome of Homo sapiens for the presence of possible viral inserts in human genome. The results from the computational analysis revealed that viruses resulted in viral segments inserted in the intron and exon regions of human genome. Which shows that the alignments the residues greater than 25 to 30%, identifies between 90 to 100% and the sequences located in the regions were considered. Predicting the antigenic regions of a protein is of prime importance in assessing the states of a polypeptide chain as exposed or buried regions. Hydrophilicity plot of Hoop-Woods scale amino acid sequence of a protein on its x-axis, and degree of hydrophobicity or hydrophilicity on its y-axis using python language as architecture by utilizing various functional attributes such as scipy, matplot and numpy modules were reported.

The work can identified to know the possible viral inserts of known &unknown viruses. The work needs huge data and stringent algorithms to carry out the task. However, a manageable construction of sequential procedures both computational and experimental analysis would find feasibility in analyzing almost all viral sequence inserts in various eukaryotes. This would certainly help in developing tools or procedures to combat diseases that may be dreadful due to virus attacks and also would contribute greatly in the area of Drug Discovery.

· The main objective of this project is to identify viral segments in every human gene.

· This Project will identify known & unknown viruses are present in Human Genome.

· At least try to identify minimum 10 known & unknown viruses which are present in Human Genome before completing this project.

· This project Deals with the subjects Bio-informatics, computational Biology, And Genetic algorithms which are electives for pre-final year, final year of B.Tech and 1^st and 2^ndSemesters of M.Tech. So this project is useful for both undergraduate and post graduate students for their regular academic activity.

· The project also uses large scale of Computational analysis using two programming languages perl and python for matching the human genome and viral segments.

· Protein sequence databases are categorized as primary, composite or secondary. Primary databases contain more than 300,000 protein sequences and function as a repository for the raw data. Some more common repositories, such as SWISS-PROT and PIR International, annotate the sequences as well as describe the proteins’ functions, its domain structure and post-translational modifications.

· Composite databases such as OWL and the NRDB compile and filter sequence data from different primary databases to produce combined non-redundant sets that are more complete than the individual databases and also include protein sequence data from the translated coding regions in DNA sequence databases.

· Next we look at databases of macromolecular structures. The Protein Data Bank, PDB, provides a primary archive of all 3D structures for macromolecules such as proteins, RNA, DNA and various complexes. As the information provided in individual PDB entries can be difficult to extract, PDBsum provides a separate Web page for every structure in the PDB displaying detailed structural analyses, schematic diagrams and data on interactions between different molecules in a given entry.

· Three major databases classify proteins by structure in order to identify structural and evolutionary relationships: CATH, SCOP, and FSSP databases. All comprise hierarchical structural taxonomy where groups of proteins increase in similarity at lower levels of the classification tree. In addition, numerous databases focus on particular types of macromolecules. These include the Nucleic Acids Database, NDB, for structures related to nucleic acids, the HIV protease database for HIV-1, HIV-2 and SIV protease structures and their complexes, and ReLiBase for receptor-ligand complexes.

Friday, 15 November 2013

Bioinformatics