$29
----------------------------------------------------------------------
Purpose:
* Demonstrate (elementary) understanding of Regular Expressions
and how to use them in various useful languages.
----------------------------------------------------------------------
Background:
* The input file will have multiple proposed tokens on each
line. There also might be lines with no proposed tokens
and / or just or extra whitespace.
* The proposed tokens will be separated by whitespace, which is
to be ignored.
* Your program will consider successive tokens from the input
file and classify them as 'GeePea', 'Shake', 'Orc', or
'does not match'.
* An GeePea is an odd number of 'g' or 'G' letters followed by
one or more exclamation points '!' and question marks '?'
followed by 'PEA' if it was only exclamation points, 'pea' if
it was only question marks, and nothing else if it was a
mixture of both.
(FYI: "an odd number" means 1 or 3 or 5 or 7 or ..., which
can also be stated as 2n+1, where n = 0.)
* A Shake is an ampersand '&' or a plus '+' or a '/' followed by
an even number of letters 'a' through 'z' followed by an
ampersand '&' or a plus '+' or a '/' BUT IT CANNOT END WITH
THE SAME CHARACTER WITH WHICH IT STARTED. That is, if it
begins with '&', it must end with '+' or '/', and so forth.
(FYI: "an even number" means 0 or 2 or 4 or 6 or ..., which
can also be stated as 2n, where n = 0.)
* An Orc is a ' '. followed by zero or more letters 'r' through
'w' or 'R' through 'W', followed by an ampersand '&' when the
letters are lowercase, an asterisk '*' when the letters are
uppercase, or an at sign '@' when there are no letters at all.
Mixing lowercase and uppercase letters is not allowed.
----------------------------------------------------------------------
Examples:
g!PEA -- legal GeePea
gGgGG!!!PEA -- legal GeePea
GGG?pea -- legal GeePea
GGG?!? -- legal GeePea
gG!PEA -- illegal GeePea, not odd number of g/G letters
ggg!Pea -- illegal GeePea, should have been PEA at end
GgGGg?!?pea -- illegal GeePea, should have had no pea at end
gggGGG -- illegal GeePea, need !/? marks (and maybe pea/PEA)
&abcd/ -- legal Shake
/mnop+ -- legal Shake
+gxhyiz& -- legal Shake
&abcd& -- illegal Shake, begins and ends with same character
/mop+ -- illegal Shake, odd number of letters
+GxHyIz& -- illegal Shake, not all lowercase letters
rtv& -- legal Orc
WTUSU* -- legal Orc
@ -- legal Orc
rrr& -- illegal Orc, no at front
UVUV@ -- illegal Orc, should be * at end
rStU& -- illegal Orc, can't mix lowercase and uppercase
----------------------------------------------------------------------
Tasks:
1. Download HMWK_04_dalioba.zip from Blackboard.
2. Unzip the file somewhere convenient.
3. Change 'dalioba' in the name of the directory to your NetID.
(Your NetID is three letters followed by four or five digits.
The directory name will now be something like
'hmwk_04_abc1234', with YOUR NetID instead of 'abc1234'.)
4. Look in that directory.
5. Change the header lines in the skeleton files
hmwk_04.c / .cc :
-- Line 1: Family name first, then a comma, then
personal name.
-- Line 2: Your NetID.
-- Line 3: The date you edited the file.
6. Run the files you just changed with the provided
'inputdata.text' as the input file.
7. Observe the following output (it will be the same no matter
which language you picked):
processing tokens from inputdata.txt ...
g!PEA< is the proposed token.
gGgGG!!!PEA< is the proposed token.
GGG?pea< is the proposed token.
GGG?!?< is the proposed token.
gG!PEA< is the proposed token.
ggg!Pea< is the proposed token.
GgGGg?!?pea< is the proposed token.
gggGGG< is the proposed token.
&abcd/< is the proposed token.
/mnop+< is the proposed token.
+gxhyiz&< is the proposed token.
&abcd&< is the proposed token.
/mop+< is the proposed token.
+GxHyIz&< is the proposed token.
rtv&< is the proposed token.
WTUSU*< is the proposed token.
@< is the proposed token.
rrr&< is the proposed token.
UVUV@< is the proposed token.
rStU&< is the proposed token.
8. Now, change the contents of processToken() function in each
of the hmwk_04.c and .cc files to use the regular expression
support of the corresponding language so that the following
output is generated for the 'inputdata.txt' test case file.
processing tokens from inputdata.txt ...
g!PEA< matches GeePea.
gGgGG!!!PEA< matches GeePea.
GGG?pea< matches GeePea.
GGG?!?< matches GeePea.
gG!PEA< does not match.
ggg!Pea< does not match.
GgGGg?!?pea< does not match.
gggGGG< does not match.
&abcd/< matches Shake.
/mnop+< matches Shake.
+gxhyiz&< matches Shake.
&abcd&< does not match.
/mop+< does not match.
+GxHyIz&< does not match.
rtv&< matches Orc.
WTUSU*< matches Orc.
@< matches Orc.
rrr&< does not match.
UVUV@< does not match.
rStU&< does not match.
9. You should get the same output for each of the two languages.
Make your output match this format EXACTLY since when your
solutions are tested, their output will be checked using
diff.
----------------------------------------------------------------------
Submission:
Make a zipfile of your 'hmwk_04_abc1234' directory (where
'abc1234' is replaced with YOUR NetID) and submit it on Blackboard
as your results for this assignment. Your submission should be a
zipfile that has exactly one item in it, a directory named
'hmwk_04_abc1234' (where 'abc1234' is YOUR NetID). Inside that
directory should be two source files, hmwk_04.c and hmwk_04.cc.
Your submission will be run on another file of test data.
That file will have 24 possible tokens and your solutions will
score 1/2 point for each token that generates the correct message.
Therefore, the maximum possible score for this homework assignment
is 24 points (12 + 12).
----------------------------------------------------------------------
Hints:
1. Ensure your programs compile and run correctly. Not
compiling or not generating the correct output will cost you
points.
Ensure your output messages match the format shown above when
you change the processToken() function. The output is going
to be checked by a program, so it has to match EXACTLY.
After you write your programs, use diff or fc to compare
your output to the supplied 'outputdata.txt'. It must match
EXACTLY or you will be penalized points.
('EXACTLY' means character-by-character the same. No, e.g.,
differences in spacing, no changes in wording, no changes
in punctuation, no changes in capitalization, and so forth.
Check your output against the 'outputdata.txt' file!)
2. Ensure that you update the three header lines in each of the
source files with YOUR name (family name first, then a comma,
then your personal name), YOUR NetID, and the date you edit
the file.
Not updating the header lines properly will cost you points.
3. DO NOT change anything in the main() routine in the C++ case.
You might want to put some initialization code at the top of
the main() routine in the C case (depending on how you do the
processing) but DO NOT change anything else in that routine.
Your programs will be tested from the command line. If they
do not run correctly when run that way, you will score
ZERO points.
4. You might use some lines of static code aside from changing
the contents of processToken(). (This will depend on how you
decide to do the regular expressions.)
5. Ensure you use the regular expression support of the
language. Programs that do not do all of their matching
using the regular expression support of the corresponding
language will score ZERO points.
6. These programs are not complex. The processToken() routine
in the C reference solution is 12 lines of code. There are
three additional lines of static data and 12 lines of
initialization code at the beginning of the C main function.
For C++, the processToken() routine is 15 lines of code,
including three lines of static declarations.
If you find yourself writing lots more code than this in
either the C or C++ case, you're probably going down the
wrong path.
7. After you write your regular expressions, make up some test
cases of your own to ensure that your REs really match the
descriptions given above. The test cases in 'inputdata.txt'
are useful, but they are NOT comprehensive. Make up some
more of your own.
8. Ensure your submission to Blackboard is packaged EXACTLY as
described above.
-- Your submission should be a ZIP FILE (not a tar, rar, gz,
or any other kind of compressed file).
-- The zip file should be named 'hmwk_04_abc1234.zip' (with
'abc1234' replaced with YOUR NetID).
-- This zip file should have ONE item in it, a directory
named 'hmwk_04_abc1234' (with 'abc1234' replaced with
YOUR NetID).
-- Your source files should be in that directory. The
source files should be named hmwk_04.c / .cc.
Submissions in the wrong format score ZERO points.
9. After you submit your zip file on Blackboard, download it
from Blackboard and check that your submission is in the
proper format, that the programs run and print the correct
output, and that you updated the header lines correctly in
each of the source files.
10. Are you CERTAIN you complied with all of these nit-picking
instructions? Really? Maybe you ought to check just one
more time. :)
----------------------------------------------------------------------