$24
Exercise 1: DFA & NFA (1+1+1=3 Theory)
Draw an NFA over = fA,C,G,Tg that accepts all strings s 2 that t to the pattern P = AT*G*TC. The * symbol can be replaced by any sequence s0; s00 2 . For example, if s0 = ACCGT and s00 = A , then P = ATACCGTGATC.
Draw a DFA that accepts the same language (i.e. the same set of strings).
If M is an NFA that recognizes language C, does swapping the accept and non-
accept states in M necessarily yield a new NFA that recognizes C? Explain why or why not.
Exercise 2: Another DFA (1+1=2 Theory)
Draw a DFA over = fA,Cg that accepts all strings s 2 that match the pattern P = A10C10. That is, ten times A followed by ten times C.
Does a DFA that accepts all strings that match P = AnCn for any n 1 exist? Explain why or why not.
Exercise 3: Shift-And Algorithm (2+2=4 Theory, 3 Programming)
A wildcard character # is a special character which matches any single character. Example: the pattern ATT#CG matches ATTTCG, ATTCCG, ATTGCG, ATTACG. Describe how the Shift-And algorithm can be modi ed to handle such wildcard characters.
In multiple string matching, we are given a text T and set of patterns
P = fp1; p2; p3; :::; pmg
The goal is to nd all matches between any of the patterns and the text.
Describe how the Shift-And algorithm can be modi ed so as to solve the multiple string matching problem.
Implement the Shift-And algorithm for multiple patterns. Your program should take two parameters.
the name of an input le with the set of patterns. Patterns are line separated.
the name of an input le with the text T to be searched.
For each pattern, the output should contain a new line, start with the name of the pattern and followed by a comma-separated list of all positions in the text where the pattern occurrences end. Positions should be 0-based, i.e., counting starts at 0. Example: Assume a le input text.txt exists and contains the text GGAAGGAA and also a le patterns.txt exists and contains the pattern AA, GG and TT. Then your program should work as follows:
1
$ . / program p a t t e r n s . t x t i n p u t
t e x t . t x t
AA 3,7
3 GG 1,5
4 TT
Remarks:
There are 12 points to be earned on each assignment sheet.
50% of programing points and 50% of theoretical exercises are necessary to take the exam.
You are allowed to work in groups of two.
Hand in your solutions on paper (except for source code) by putting it in the letter box of Tobias Marschall in E2.1 (ground oor).
Programming code is to be sent as a tar.gz package by mail to: { aryan.3264@gmail.com
Source code is only considered if
{ it is in one of the languages Python, C, C++, or Java, { it is reasonably documented, commented, and readable, { command-lines for compiling and calling are provided, { compilation does not fail, and
{ executables are named according to:
lastname1 lastname2 assignment1 exercise3
For each programing task, there will be three input les. One is for you for testing and two are for us for grading. The latter two are kept secret and points are awarded based on whether your program computes the right answer for these two input les.
Do not forget to mention your names and matriculation numbers on your solutions!
Copying between groups will result in zero points for all involved groups!