Starting from:
$35

$29

CS Implementing a Lexical Analyzer (Scanner) for MailScript Solution


    • Introduction

In this homework you will implement a scanner for MailScript language using ex. Your scanner will be used to produce the tokens in a MailScript program. This is a scripting language that will be used for e-mail automation.

MailScript is loosely based on AppleScript, which is a scripting lan-guage that facilitates automated control over scriptable Mac applications. MailScript uses a block structure. Each block starts with the keywords \Mail from" followed by the e-mail of the user and a colon symbol. Each block ends with the keywords \end Mail". One can use the keyword \send" in order to send e-mails to certain users. Also they can create a sub-block called \schedule" and specify a certain date for the e-mails to be sent. One can "set" variables in and out of block. An example MailScript le is given below:

set Message ("Welcome to CS305.")

set Name ("Deniz")

Mail from derya@mail.com:

send ["Hello!"] to [("Ayse", ayse@mail.com), (mehmet@mail.com), ("Mehmet", mehmet@mail.com)]

schedule @ [03/10/2021, 16:00]:


1





send [Message] to [("Beril", beril@mail.com.tr)] send ["Thank you

very much."] to [(Name, deniz@mail.com.tr)] end schedule

end Mail


Mail from derya@sabanciuniv.edu:

schedule @ [02/10/2021, 16:00]:

send ["Good morning!"] to [(ali@mail.com), ("Ferhat Kaya", ferhat@mail.com), ("Ali", ali@mail.com)]

end schedule

end Mail

Mail from derya@mail.com:

schedule @ [03/10/2021, 04:00]:

send ["How are you?"] to [("Omer", omer@mail.com)] end schedule

end Mail

In a MailScript program, there may be keywords (e.g. send, set, ...), values (e.g. e-mails, dates, ...), identi ers (programmer de ned names for variables, etc.) and punctuation symbols (like , : etc.). Your scanner will catch these language constructs (introduced in Sections 2, 3, 4) and print out the token names together with their positions (explained in Section 5) in the input le. Please see Section 6 for an example of the required out-put. The following sections provide extensive information on the homework. Please read the entire document carefully before starting your work on the homework.






2





    • Keywords

Below is the list of keywords to be implemented, together with the corre-sponding token names.

Lexeme
Token
Lexeme
Token




Mail
tMAIL
end Mail
tENDMAIL




schedule
tSCHEDULE
end schedule
tENDSCH




send
tSEND
to
tTO




from
tFROM
set
tSET





MailScript is case{sensitive. Only the lexemes given above are used for the keywords.


    • Operators & Punctuation Symbols

Below is the list of operators and punctuation symbols to be implemented, together with the corresponding token names.

Lexeme
Token
Lexeme
Token




,
tCOMMA
:
tCOLON




(
tLPR
)
tRPR




[
tLBR
]
tRBR




@
tAT











    • Identi ers & Values

You need to implement identi ers and values (strings, e-mails, dates and time). Here are some rules you need to pay attention to:



3





    • An identi er consists of any combination of letters, digits and under-score character. However, it cannot start with a digit.

    • The token name for an identi er is tIDENT.

    • Anything inside a pair of quotation marks is considered as a string. However a string cannot contain a quotation mark itself. The empty string (an opening quote immediately followed by a closing quote) is also a string. The token name for a string is tSTRING.

    • An e-mail address is made up of a local-part, an @ sign and a domain. It can be represented as local-part@domain.

    • The local-part can only contain uppercase and lowercase Latin letters, digits, hyphens (-), underscores ( ) and dots (provided that they cannot be the rst or the last characters of the local-part and cannot appear consecutively). For example, the following lexemes would NOT be rec-ognized as an e-mail address: .example@mail.com, example.@mail.com, ex..ample@mail.com.

    • The domain name should consist of 2 or 3 dot-separated labels: xxx.xxx or xxx.xxx.xxx. Each label can contain uppercase and lowercase Latin letters, digits and hyphens (-), provided that it is not the rst or last character. For example, the following lexemes would NOT be recog-nized as an e-mail address: example@-mail.com, example@mail-.com.

    • The token name for an e-mail address is tADDRESS. Some examples are given below:


example@mail.com

john.smith@new-mail.com.tr

User--123@a-really-long-domain.name

12345@12345.12345



    • A date will be given in the following format: DD/MM/YYYY (e.g. 11/10/2021). All day, month and year slots can take any digit. Hence the following example would also be considered a date variable: 45/98/0290. The token name for a date variable is tDATE.


4





    • A time variable will be given in the following format: HH:MM (e.g.

16:30). All hour and minute slots can take any digit. Hence the fol-

lowing example would also be considered a time variable: 25:90. The token name for a time variable is tTIME.

    • Any character which is not a whitespace character and which cannot be detected as the part of lexeme of a token should be printed out together with an error message.


    • Positions

You must keep track of the position information for the tokens. For each token that will be reported, the line number at which the lexeme of the token appears is considered to be the position of the token. Please see Section 6 to see how the position information is reported together with the token names.

    • Input, Output, and Example Execution

Assume that your executable scanner (which is generated by passing your ex program through ex and by passing the generated lex.yy.c through the C compiler gcc) is named as MSScanner. Then we will test your scanner on a number of input les using the following command line:

MSScanner < test17.ms

As a response, your scanner should print out a separate line for each token it catches in the input le (test17.ms given in Figure 1). The output format for a token is given below for each token separately:








Token
Output






identi er, date, time, string,
hrowihspaceihtoken

nameihspacei(hlexemei)

e-mail addr.




for all other tokens
hrowihspaceihtoken

namei






for illegal characters
hrowihspacei ILLEGAL CHARACTER


hspacei(hlexemei)






5





Here, hrow i gives the location (line number) of the rst character of the lex-eme of the token and htoken namei is the token name for the current item. hlexemei will display the lexeme of the tokens of type identi er (tIDENT), date (tDATE), time (tTIME), string (tSTRING) and e-mail address (tAD-DRESS). And hspacei corresponds to a single space. Also, please note that you are supposed to print the lexeme of the tSTRING token without the quotation marks as given in the examples below. For example let us assume that test17.ms has the following content:





Mail from beril@mail.com:

set News ("Hi, I will not be joining today’s meeting.")

schedule @ [19/12/2021, 08:30]:

send [News] to [("Tugce", tugce@company-mail.com)] end schedule

end Mail



Figure 1: An example MailScript program: test17.ms

Then the output of your scanner must be:

    • tMAIL

    • tFROM

    • tADDRESS (beril@mail.com)

    • tCOLON

    • tSET

    • tIDENT (News)

    • tLPR

    • tSTRING (Hi, I will not be joining today’s meeting.)

    • tRPR

    • tSCHEDULE

    • tAT

    • tLBR


6





4 tDATE (19/12/2021)

    • tCOMMA

    • tTIME (08:30)

    • tRBR

    • tCOLON

    • tSEND

    • tLBR

    • tIDENT (News)

    • tRBR

    • tTO

    • tLBR

    • tLPR

    • tSTRING (Tugce)

    • tCOMMA

    • tADDRESS (tugce@company-mail.com)

    • tRPR

    • tRBR

    • tENDSCH

8 tENDMAIL

Note that, the content of the test    les need not be a complete or correct

MailScript program. If the content of a test    le is the following:




    • schedule ]Ali @ from Mail example@mail.edu

send 99/99/9999

.example@mail.com example@-mail.com



Figure 2: An example MailScript program: test18.ms

Then your scanner should not complain about anything and output the fol-lowing information:


7





    • tCOLON

    • tCOLON

    • tSCHEDULE

    • tRBR

    • tIDENT (Ali)

    • tAT

    • tFROM

    • tMAIL

2 tADDRESS (example@mail.edu)

3 tSEND

4 tDATE (99/99/9999)

5 ILLEGAL CHARACTER (.)

5 tADDRESS (example@mail.com)

6 tIDENT (example)

    • tAT

    • ILLEGAL CHARACTER (-)

    • tIDENT (mail)

    • ILLEGAL CHARACTER (.)

    • tIDENT (com)


    • How to Submit

Submit only your ex le (without zipping it) on SUCourse. The name of your ex le must be:

username-hw1. x

where username is your SuCourse username.


    • Notes

        ◦ Important: Name your le as you are told and don’t zip it. [-10 points otherwise]

        ◦ Do not copy-paste MailScript program fragments from this document as your test cases. Copy/paste from PDF can create some unrecognizable characters. Instead, all MailScript codes fragments that appear in this document are provided as a text le for you to use.

8





    • Make sure you print the token names exactly as it is supposed to be. You will lose points otherwise.

    • No homework will be accepted if it is not submitted using SUCourse+.

    • You may get help from our TA or from your friends. However, you must write your ex le by yourself.

    • Start working on the homework immediately.

    • If you develop your code or create your test les on your own computer (not on ow.sabanciuniv.edu), there can be incompatibilities once you transfer them to ow.sabanciuniv.edu. Since the grading will be done automatically on the ow.sabanciuniv.edu, we strongly encourage you to do your development on ow.sabanciuniv.edu, or at least test your code on ow.sabanciuniv.edu before submitting it. If you prefer not to test your implementation on ow.sabanciuniv.edu, this means you accept to take the risks of incompatibility. Even if you may have spent hours on the homework, you can easily get 0 due to such incompatibil-ities.

    • LATE SUBMISSION POLICY:

Late submission is allowed subject to the following conditions:

{ Your homework grade will be decided by multiplying what you get from the test cases by a \submission time factor (STF)".

{ If you submit on time (i.e. before the deadline), your STF is 1. So, you don’t lose anything.

{ If you submit late, you will lose 0.01 of your STF for every 5 mins of delay.

{ We will not accept any homework later than 500 mins after the deadline.

{ SUCourse+’s timestamp will be used for STF computation.

{ If you submit multiple times, the last submission time will be used.







9

More products