Starting from:
$35

$29

Computational Bioinformatics Assignment 1 Solution

Question 1




Generating pseudo-genomic sequences.




 
Create a sequence ACTGACTG.... of length 400.




 
Generate a random string of nucleotides of length 400 with equal probabilities.




 
Generate a random string of nucleotides of length 600 with equal probabilities of nucleotides in every position that is a not a multiple of 3, and with p(a) = 0:5, p(c) = 0:25, p(t) = 0:15 in every position that is a multiple of 3.




Question 2




The package seqinr.




 
Install the package in your computer (or directory, if you are using the departmental server).




 
Read the package documentation. Try the command lseqinr().




 
Get data les 1 and 2 from https://www.ncbi.nlm.nih.gov/nuccore/257787102?report= fasta&log$=seqview (choose save as fasta le) and http://www.ncbi.nlm.nih.gov/sviewer/ viewer.fcgi?tool=portal&db=nuccore&val=11497621&dopt=fasta&sendto=on&log$=seqview& extrafeat=0&maxplex=1




 
Read the le 1 (fasta format). Output the following statistics of the les:




 
Percentages of a,c,t,g.




 
a table of the distribution of dimers (i.e. pairs of nucleotides). E.g., the segment acc has 1 ac and 1 cc.




 
a table of the distribution of trimers (also called codons), but non-overlapping; so acctcg has 1 acc and 1 tcg.




Question 3




Writing simple functions in R.



 
Write a R function that takes as input 2 indices and extracts the nucleotides between those indices (e.g. the inputs 10,15 should result in nucleotides 10 through 15 (inclusive) being extracted); then, the segment should be converted to an indicator sequence for the nucleotide g (it has a 1 in places where g occurs and 0 everywhere else). For this indicator sequence, plot the discrete fourier transform of the indicator sequence (plot the magnitude of the Fourier coe cients only).




 
Use your function on one long protein coding region and one long non-coding region from data le 2, as well as the sequence created in Q 1(c). There will be an annotation le for le 2 that you can use to identify these. Report any signi cant nding from these plots.





More products