$29
----------------------------------------------------------------------
Purpose:
* Demonstrate (elementary) understanding of Regular Expressions
and how to use them in various useful languages.
----------------------------------------------------------------------
Background:
* The input file will have multiple proposed tokens on each
line. There also might be lines with no proposed tokens
and / or just or extra whitespace.
* The proposed tokens will be separated by whitespace, which is
to be ignored.
* Your program will consider successive tokens from the input
file and classify them as 'EffPea', 'Stir', 'Ent', or
'does not match'.
* An "EffPea" is one or more vowels 'a', 'e', 'i', 'o', 'u',
'A', 'E', 'I', 'O', 'U', followed by a left brace '{',
followed by a right parenthesis ')', followed by one or more
hexadecimal digits '0' through '9', 'a' through 'f', 'A'
through 'F'.
* A "Stir" is a right brace '}' followed by zero or more
lowercase letters 'a' through 'z' and decimal digits '0'
through '9', followed by a left parenthesis '('.
* An "Ent" is an at sign '@', followed by one or more decimal
digits '0' through '9' or uppercase letters 'R' through 'W',
followed by an octothorpe sign ' '.
----------------------------------------------------------------------
Examples:
a{)0 -- legal EffPea
eieio{)000AAAfff -- legal EffPea
aAeEiIoOuU{)0123456789abcdefABCDEF -- legal EffPea
ba{)9 -- illegal EffPea, b is not a vowel
u){b -- illegal EffPea, ){ is not {)
{)a -- illegal EffPea, no vowel in front of {)
}q( -- legal Stir
}r1s23( -- legal Stir
}( -- legal Stir
{great( -- illegal Stir, { is not }
}(a -- illegal Stir, a is after (
{Wrong( -- illegal Stir, W is uppercase
@RTW -- legal Ent
@0S9V -- legal Ent
@999 -- legal Ent
834 -- illegal Ent, no @ at front
@tvw -- illegal Ent, tvw are lowercase
@ABC -- illegal Ent, ABC are not in R through W
----------------------------------------------------------------------
Tasks:
1. Download HMWK_03_dalioba.zip from Blackboard.
2. Unzip the file somewhere convenient.
3. Change 'dalioba' in the name of the directory to your NetID.
(Your NetID is three letters followed by four or five digits.
The directory name will now be something like
'hmwk_03_abc1234', with YOUR NetID instead of 'abc1234'.)
4. Look in that directory.
5. Change the header lines in the skeleton files
hmwk_03.cs / .java / .py.
-- Line 1: Family name first, then a comma, then
personal name.
-- Line 2: Your NetID.
-- Line 3: The date you edited the file.
6. Run the files you just changed with the provided
'inputdata.text' as the input file.
7. Observe the following output (it will be the same no matter
which language you picked):
processing tokens from inputdata.txt ...
a{)0< is the proposed token.
eieio{)000AAAfff< is the proposed token.
aAeEiIoOuU{)0123456789abcdefABCDEF< is the proposed token.
ba{)9< is the proposed token.
u){b< is the proposed token.
{)a< is the proposed token.
}q(< is the proposed token.
}r1s23(< is the proposed token.
}(< is the proposed token.
{great(< is the proposed token.
}(a< is the proposed token.
{Wrong(< is the proposed token.
@RTW < is the proposed token.
@0S9V < is the proposed token.
@999 < is the proposed token.
834 < is the proposed token.
@tvw < is the proposed token.
@ABC < is the proposed token.
8. Now, change the contents of processToken() function in each
of the hmwk_03.cs, .java, and .py files to use the regular
expression support of the corresponding language so that the
following output is generated for the 'inputdata.txt' test
case file.
processing tokens from inputdata.txt ...
a{)0< matches EffPea.
eieio{)000AAAfff< matches EffPea.
aAeEiIoOuU{)0123456789abcdefABCDEF< matches EffPea.
ba{)9< does not match.
u){b< does not match.
{)a< does not match.
}q(< matches Stir.
}r1s23(< matches Stir.
}(< matches Stir.
{great(< does not match.
}(a< does not match.
{Wrong(< does not match.
@RTW < matches Ent.
@0S9V < matches Ent.
@999 < matches Ent.
834 < does not match.
@tvw < does not match.
@ABC < does not match.
9. You should get the same output for each of the three
languages. Make your output match this format EXACTLY since
when your solutions are tested, their output will be checked
using diff.
----------------------------------------------------------------------
Submission:
Make a zipfile of your 'hmwk_03_abc1234' directory (where
'abc1234' is replaced with YOUR NetID) and submit it on Blackboard
as your results for this assignment. Your submission should be a
zipfile that has exactly one item in it, a directory named
'hmwk_03_abc1234' (where 'abc1234' is YOUR NetID). Inside that
directory should be three source files, hmwk_03.cs, hmwk_03.java,
and hmwk_03.py.
Your submission will be run on another file of test data.
That file will have 24 possible tokens and your solutions will
score 1/2 point for each token that generates the correct message.
Therefore, the maximum possible score for this homework assignment
is 36 points (12 + 12 + 12).
----------------------------------------------------------------------
Hints:
1. Ensure your programs compile and run correctly. Not
compiling or not generating the correct output will cost you
points.
Ensure your output messages match the format shown above when
you change the processToken() function. The output is going
to be checked by a program, so it has to match EXACTLY.
After you write your programs, use diff or fc to compare
your output to the supplied 'outputdata.txt'. It must match
EXACTLY or you will be penalized points.
('EXACTLY' means character-by-character the same. No, e.g.,
differences in spacing, no changes in wording, no changes
in punctuation, no changes in capitalization, and so forth.
Check your output against the 'outputdata.txt' file!)
2. Ensure that you update the three header lines in each of the
source files with YOUR name (family name first, then a comma,
then your personal name), YOUR NetID, and the date you edit
the file.
Not updating the header lines properly will cost you points.
3. DO NOT change anything in the 'Main' (C ) or 'main' (Java,
Python) functions. Those routines will pump the proposed
tokens into the processToken() function for you.
Your programs will be tested from the command line. If they
do not run correctly when run that way, you will score
ZERO points.
4. You might use some lines of static code aside from changing
the contents of processToken(). (This will depend on how you
decide to do the regular expressions.)
5. Ensure you use the regular expression support of the
language. Programs that do not do all of their matching
using the regular expression support of the corresponding
language will score ZERO points.
6. These programs are not complex. The processToken() routine
in the C reference solution is 14 lines of code, including
some static data. For Java, it's 12 lines of code. For
Python, it's 12 lines of code, including some static data.
If you find yourself writing lots more code than this, you're
probably going down the wrong path.
7. After you write your regular expressions, make up some test
cases of your own to ensure that your REs really match the
descriptions given above. The test cases in 'inputdata.txt'
are useful, but they are NOT comprehensive. Make up some
more of your own. This is so important that the first three
persons who send me an email with twelve additional (and
different) test tokens will score a bonus point. Three each
should be legal tokens for EffPea, Stir, and Ent and three
should be illegal tokens (that is, they do not match any
category). Think hard because to score the bonus point your
twelve proposed tokens have to be different from those anyone
else has already sent me. The illegal tokens should be CLOSE
to the token definitions, but not quite correct. No proposed
test token should be more than ten characters long.
8. Ensure your submission to Blackboard is packaged EXACTLY as
described above.
-- Your submission should be a ZIP FILE (not a tar, rar, gz,
or any other kind of compressed file).
-- The zip file should be named 'hmwk_03_abc1234.zip' (with
'abc1234' replaced with YOUR NetID).
-- This zip file should have ONE item in it, a directory
named 'hmwk_03_abc1234' (with 'abc1234' replaced with
YOUR NetID).
-- Your source files should be in that directory. The
source files should be named hmwk_03.cs / .java / .py.
Submissions in the wrong format score ZERO points.
9. After you submit your zip file on Blackboard, download it
from Blackboard and check that your submission is in the
proper format, that the programs run and print the correct
output, and that you updated the header lines correctly in
each of the source files.
10. Are you CERTAIN you complied with all of these nit-picking
instructions? Really? Maybe you ought to check just one
more time. :)
----------------------------------------------------------------------