Humans have been
carrying unwanted viral gene segments since many years and reports suggests
that approximately 3-8 % of the human genome has been comprised of viral DNA.
In this point of view, various viral sequences were downloaded from NCBI Tax
Browser and scanned against complete genome of Homo sapiens for the presence of possible viral inserts in human
genome. The results from the computational analysis revealed that viruses
resulted in viral segments inserted in the intron and exon regions of human genome.
Which shows that the alignments the residues greater than 25 to 30%, identifies
between 90 to 100% and the sequences located in the regions were considered.
Predicting the antigenic regions of a protein is of prime importance in
assessing the states of a polypeptide chain as exposed or buried regions.
Hydrophilicity plot of Hoop-Woods scale amino acid sequence of a protein on its
x-axis, and degree of hydrophobicity or hydrophilicity on its y-axis using
python language as architecture by utilizing various functional attributes such
as scipy, matplot and numpy modules were reported.
The
work can identified to know the possible
viral inserts of known &unknown viruses. The work needs huge data and
stringent algorithms to carry out the task. However, a manageable construction
of sequential procedures both computational and experimental analysis would
find feasibility in analyzing almost all viral sequence inserts in various
eukaryotes. This would certainly help in developing tools or procedures to
combat diseases that may be dreadful due to virus attacks and also would
contribute greatly in the area of Drug Discovery.
·
The main objective of this project is to
identify viral segments in every human gene.
·
This Project will identify known & unknown
viruses are present in Human Genome.
·
At least try to identify minimum 10 known
& unknown viruses which are present in Human Genome before completing this
project.
·
This project Deals with the subjects Bio-informatics,
computational Biology, And Genetic algorithms which are electives for pre-final
year, final year of B.Tech and 1st and 2nd Semesters of
M.Tech. So this project is useful for both undergraduate and post graduate
students for their regular academic activity.
·
The project
also uses large scale of Computational analysis using two programming
languages perl and python for matching the human genome and viral segments.
·
Protein sequence databases are
categorized as primary, composite or secondary. Primary databases contain more
than 300,000 protein sequences and function as a repository for the raw data.
Some more common repositories, such as SWISS-PROT and PIR International,
annotate the sequences as well as describe the proteins’ functions, its domain
structure and post-translational modifications.
·
Composite databases such as OWL and
the NRDB compile and filter sequence data from different primary databases to
produce combined non-redundant sets that are more complete than the individual
databases and also include protein sequence data from the translated coding
regions in DNA sequence databases.
·
Next we look at databases of
macromolecular structures. The Protein Data Bank, PDB, provides a primary
archive of all 3D structures for macromolecules such as proteins, RNA, DNA and
various complexes. As the information provided in individual PDB entries can be
difficult to extract, PDBsum provides a
separate Web page for every structure in the PDB displaying detailed structural
analyses, schematic diagrams and data on interactions between different
molecules in a given entry.
·
Three major databases classify
proteins by structure in order to identify structural and evolutionary
relationships: CATH, SCOP, and FSSP databases. All comprise hierarchical
structural taxonomy where groups of proteins increase in similarity at lower
levels of the classification tree. In addition, numerous databases focus on
particular types of macromolecules. These include the Nucleic Acids Database, NDB,
for structures related to nucleic acids, the HIV protease database for HIV-1,
HIV-2 and SIV protease structures and their complexes, and ReLiBase for receptor-ligand complexes.