Homework #2 (100 points)

Assigned: 1/17/08 ---> Due: 1/29/08

There are three parts to this homework.

Write up each of the algorithms as pseudocode - a series of logical steps needed to complete the task written in plain english. Test your algorithm by trying a few sample sequences and stepping through the process by hand to make sure it is correct. Note that you can reuse some of the "code" you generate in more than one part of this homework so you should read all of the parts before you start to answer any of them. In all parts of this homework, please be sure to provide a plain English explanation for what you are doing. This can be comments as part of the actual pseudocode or can be a separate section.

Write up all the answers in a single Microsoft Word document. FORMAT: Please use Courier or Monaco font, 10 point size. Use TAB settings of 0.5" for each indentation.

  1. Describe an algorithm to find all positions of exact matches of length L between a query sequence Q letters long, and a target sequence, T letters long. Note that different queries and targets will have different values for Q and for T. L will remain constant. You have a maximum of two pages for this section. (30 points)
  2. Describe the same algorithm as above, but allow a user specified number of mismatches (M). Note that M cannot be greater than L, although having M get too large becomes meaningless. Choose a reasonable maximum for M - what is the largest percentage mismatch you will allow? You have a maximum of three pages for this section. (30 points)
  3. Describe an algorithm designed to determine the similarity score of two sequences to each other. Note that you will first have to define similarity, and include provisions for matches, mismatches, and gaps in the alignment. The sequences might be different lengths. For this assignment you can begin by assuming that you are given two sequences that have already been aligned starting at the first letter of each sequence. Your scoring algorithm should work for both DNA and protein sequences and should determine a score indicating how similar the two sequences are to each other given the alignment already provided. You have a maximum length of three pages for this section. (40 points)

The file should be identified with your name in the format LastName_FirstName_HW2. Either Zip or stuff the file and submit it through the Blackboard.

Please hand in your algorithm in plain English - do not use any computer languages such as Java, C++, or PERL. What I am looking for is a simple step by step algorithm in standard English. If you are not sure, please ask me what to do.

Instructions for submitting homework.

Bio 39/139 Home Page

This page was last modified on Tue, Jan 8, 2008, 4:25:43 PM