BE6056 Bioinformatics & Molecular Modelling

BE6056 Bioinformatics & Molecular Modelling

Semester A 2019

Assessment 1

For all questions illustrate your answers fully, describing what you did at every step and providing output illustrating what output was obtained.

You need to include, embedded, within your submitted work all relevant output from online servers, as appropriate, as well as a written dialogue to fully illustrate what work you carried out. For all parts describe how you obtained your data by stating the bioinformatics portal used and the search strategy. Please quote accession numbers of all database files used.

Q1

Hexokinase is an important enzyme which is part of the glycolytic pathway. There are several forms of the enzyme, one of which is highly expressed in cancer cells.

Use the NCBI or EBI portals to retrieve one file for each of the several different forms of hexokinase found in humans. Each file should contain the complete mRNA sequence (it is not necessary to print the sequence out).

Please check the sequence.txt output file from NCBI having coding sequence of 3 Different forms of hexokinase found in humans.

Compile a table, similar to the one from tutorial 1 (page 6), comparing the sequence elements of the mRNA of each different hexokinase forms that you find. Include extra columns indicating the length of the protein and the receptor sub-type. Comment on your findings.

Open the files and check complete mRNA sequence and found 3 different accession number.

Accession code/number.	Length of complete mRNA sequence element
5′ UTR	ORF (CDS)	3′ UTR	Total length of mRNA
Mus musculus	L16949	187	188	3026	3071
Helicoverpa armigera	KR780750	181	182	1535	1851
Aedes aegypti	AY705877	234	1620	1621	1911

Retrieve files for two genes of the hexokinase types you retrieved in part (a) and compare the structure of the genes (exon / intron profile). Comment on your findings.

Using NCBI for the sequence of Mus musculus hexokinase (Hk-1) mRNA, complete cds, Helicoverpa armigera hexokinase (HK) mRNA, complete cds and Aedes aegypti hexokinase (HK) mRNA, complete cds and the fast sequence is shown above.

For all parts describe how you obtained your data by stating the bioinformatics portal used and the search strategy. Accession numbers of all sequence files must be given. Any references used should be cited in your answer. Expect to retrieve about a dozen files in total.

Q1 – No human genes were analysed. Q not understood properly. 4/30

30 marks

Q2

Cytochrome P450s are a group of heme-thiolate monooxygenases. In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics.

In this question you will explore the relationship of the Cytochrome P450 1A1 sequence for a number of different species.

Using UniProt locate the sequence for the full-length human sequence. Then run a BLAST search for this sequence at the NCBI site (not from UniProt) against the Swiss-Prot protein database and identify 7 other different species of the protein with close similarity to the human sequence (make sure they are full length sequences).

Searching Cytochrome P450 1A1 obtained the full length of the human sequence of the UniProt to find out accession number P04798. A Cytochrome P450 monooxygenase included in the metabolism of the multiple endogenous substrates including fatty acids, steroid hormones and vitamins.

Searching the P450 1A1 and using uniprot sequence for the full-length human sequence and then search NCBI identify 7 other different species.

NCBI identify 7 different species such as Homo sapiens, Macaca mulatta, Macaca fascicularis, Canis lupus familiaris, Felis catus, Balaenoptera acutorostrata and Ovis aries

Bottom of Form

Give the Accession number for each protein sequence identified, together with the species. Give the percentage identity for each of the 7 sequences with that of the human sequence. State E values and the length of each sequence.

The percentage starting 100% and each species have different percentage.

The E values length is all 0.0. and the first 3 length of sequence are same.

Accession number	percentage identity	E values	The length of each sequence.
Homo sapiens	P04798	100.00%	0.0	512
Macaca mulatta	Q6GUR1	94.34%	0.0	512
Macaca fascicularis	P33616	94.14%	0.0	512
Canis lupus familiaris	P56590	81.89%	0.0	524
Felis catus	Q5KQT7	82.28%	0.0	517
Balaenoptera acutorostrata	Q3LFU0	82.09%	0.0	516
Ovis aries	P56591	80.59%	0.0	519

For all 8 sequences run a multiple sequence alignment using program Clustal Omega and show the alignment generated.

The first of the 8 sequence is not fully complete some of the sequence is missing.

How many positions along the multiple sequence alignment are fully conserved between species?.

This multiple sequence are 6 sequence are fully protected species and the all have equal se length. Clustal Omega means practised to run Multiple Sequence Alignment Supports to identification from conserved sequence regions to a group of sequences Conserved regions are important functionally.

Display both the cladogram and phylogram trees obtained for the aligned sequences. Briefly discuss the evolutionary relationship between the 8 species as indicated by the phylogram and cladograms. Which species is the closest relation to the human species?

The closest relationship between human species of phylogenetics and cladogram is MOUSE, MACMU and MACFA.

Q2 – Human sequence found. Blast run though you should describe what you did more fully in the script. Table of hits found. You are one sequence short. You need to discuss how you ran ClustalOmega not just present the MSA. Number of conserved positions along the sequence not stated. Just the cladogram presented, no phylogram. A little more discussion of the cladogram was expected.

25 marks

Q3

Detecting remote homologs with BLAST and PSI-BLAST.

The NCBI website (http://www.ncbi.nlm.nih.gov) gives the option to run both BLAST and PSI-BLAST for a query protein sequence. For this question you need to use the NCBI website to run both BLAST and PSI-BLAST.

Pairwise Alignments

The enzyme adenosine deaminase (UniProt accession number P00813) and the enzyme guanine deaminase (UniProt accession number P76641) perform a similar function and are remote homologs, both belonging to the SCOP superfamily metallo-dependent hydrolase. The two sequences have a percentage identity of only 15%.

Using NCBI search against the UniProtKB/Swiss-Prot database, choosing the PSI-BLAST Algorithm.

The guanine deaminase P76641 repeated three-times iteration,to identify 15% guanine deaminase is not present homologues.

NCBI search engine for the Life Sciences The webpage gives each searchable databases open Introduces the protein sequence database Select the database the protein in this case,Output from BLAST / PSI-BLAST

Output from BLAST / PSI-BLAST

Perform a protein-protein BLAST search using the sequence for the adenosine deaminase sequence (UniProt accession number P00813) searching against the UniProtKB/Swiss-Prot database. Search the results for the guanine deaminase enzyme (UniProt accession number P76641). Now repeat using PSI-BLAST and compare your results from those obtained from protein-protein BLAST.

Input blast

Discuss what you observe from the BLAST and PSI-BLAST searches. Discuss which of the two search methods proved most effective and why.

Include output as appropriate to illustrate your answer, including the pairwise alignment for the two sequences generated from your work. First, the screenshot accession number is P00813, and E-values is low, and the percentage is high. The second screenshot is accession number P76641 E-value is very high, and the percentage is low PSI-Blast output.

Q3 – you have not done what was asked for in the question re alignments. You have run PSI-Blast but with the wrong sequence and the correct sequence. No match found with distant homologue. No discussion of the merits of the 2 programs. 2/30

30 marks

Q4

Use the two protein domain databases Pfam and SMART to investigate the domain structure of the protein human Integrin beta-1(Accession code P05556). Discuss the domains located within the sequence using the two different databases and compare and contrast the results found. Discuss the functional role of the different domains located.

Pfam is a large number of multiple sequence alignments. Pfam for Accession number P05556. Domain of the PSI_integrin starting 25 end 76, Domain of the integrin_beta start 138 end 382, Domain I-EGF_1 start 466 end 495, Domain EGF_2 start 599 end 630, Domain integrin B tail 640 end 728 and Domain integrin b cyt start 752 end 796.

For P05556 the are7domains:PSI domain –from amino acid 26-76, and INB domain – from amino acid 34-464,VWA domain – from amino acid 142-372, domain – from amino acid EGF_like 559-591, domain – from amino acid Integrin_B_tail 640-728 and domain – from amino acid Integrin_b_cyt 752-798

Q4 – Pfam and SMART Domain databases used and domains identified. You havent discussed their functional roles. 10/14

15 marks

Submit the coursework online via Turnitin by 3pm on 8^th November 2019.

REFERENCE

Known post-translational modifications of human Sp1 and Sp3 and the amino acid residues modified (http://www.uniprot.org/uniprot/P08047; http://www.phosphosite.org; http://www.uniprot.org/uniprot/Q02447; Chu and Ferro, 2005; Chuang et al., 2008; Wei et al., 2009).

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA392950/…https://www.ncbi.nlm.nih.gov/bioproject/PRJNA378930/ … The role of reticulation in the rapid diversification of organisms is attracting greater attention in evolutionary biology. Here, we report a population genomics approach to test the role of

Phylogenetic tree of representative orthologs of SR-BI from different species. Amino acid sequences from the various SR-BI orthologs were analyzed using the multiple sequence alignment program Clustal Omega from EMBL-EBI (http://www.ebi.ac.uk/Tools/msa/clustalo). …Structural features of

The post BE6056 Bioinformatics & Molecular Modelling appeared first on My Assignment Online.

Plagiarism Free Assignment Help

Expert Help With This Assignment — On Your Terms

✓ Native UK, USA & Australia writers ✓ Deadline from 3 hours ✓ 100% Plagiarism-Free — Turnitin included ✓ Unlimited free revisions ✓ Free to submit — compare quotes

Write My Assignment FREE Get A Free Quote →

Q1

For all parts describe how you obtained your data by stating the bioinformatics portal used and the search strategy. Accession numbers of all sequence files must be given. Any references used should be cited in your answer. Expect to retrieve about a dozen files in total.

Q2

Q3

Q4

Expert Help With This Assignment — On Your Terms

Share this:

Like this:

Related Posts