Starting from:
$30

$24

Huffman Coding – 100 course points

The purpose of this assignment is to practice your understanding of the tree data structure.

READ the assignment as soon as it comes out! You need time to understand the assignment and to answer the many questions that will arise as you read the description and the code provided.

Refer to our Programming Assignments FAQ for instructions on how to install VSCode, how to use the command line and ho to submit your assignments.



Overview

The goal of this assignment is to implement a form of data compression. That is, given some data, we want to express the same information using less space. For this project, we will specifcally focus on compressing text fles, so we must frst understand how computers represent text internally.

Recall that computers store data as a sequence of bytes. A byte consists of eight bits, and it represents a value between 0 and 255 inclusive. To represent English text, we need a way of assigning each English letter, punctuation symbol, special character, etc. to a sequence of eight bits (a value from 0 to

255). This mapping is provided by the ASCII encoding, which is shown in the table below. Notice that ASCII only uses 128 out of the 256 possible values that a byte can store.





















For instance, consider the text “A b ????”. Using the table above, we can see that this is represented as the following sequence of bytes: “65 32 98 32 63 63 63 63”. Note that the space counts as a character, and its value in the ASCII encoding is 32.

If we write out the binary string for each character according to the table above and concatenate them together, we get “01000001001000000110001000100000001111110011111100111111 Storing our original string with the ASCII encoding requires 8*8 = 64 bits. There are 8 characters in the text “A b ????” and each character is represented by a byte that is 8 bits long.

Now, imagine if we weren’t forced to use eight bits for every character, and we could instead use the binary encoding “? = 0, [space] = 10, A = 110, b = 111”. Then our string “A b ????” would become “11010111100000”. This is only 14 bits, signifcantly smaller than the 64 bits that ASCII requires. Also notice that none of these codes are a prefx of any others, so there is no ambiguity when decoding. Here, we compressed our string by fnding a different encoding for the characters that minimized the number of bits we needed to use. This is the essence of the Huffman coding algorithm, which is described in detail in the next section.

Intuitively, not only do we want to avoid wasting space encoding characters that don’t appear very often, but we want to make sure that the characters which appear the most often receive the shortest codes. As you will see, Huffman coding is a way to do just that.



Huffman Coding
Huffman invented an algorithm that constructs the code called the Huffman code. Click here for the intuition video.


Algorithm to build the Huffman Tree. Use this algorithm in your code. Click here for video explaining how to build a tree, encode and decode.

    1. Start two empty queues: Source and Target

    2. Create a node for each character present in the input fle, each node contains the character and its occurrence probability.
    3. Enqueue the nodes in the Source queue in increasing order of occurrence probability.
    4. Repeat until the Source queue is empty and the Target queue has only one node.
        1. Dequeue from either queue or both the two nodes with the smallest occurrence probability. If the front node of Source and Target have the same occurrence probability, dequeue from Source frst.
        2. Create a new node whose character is null and occurrence probability is the sum of the occurrence probabilities of the two dequeued nodes. Add the two dequeued nodes as children: the frst dequeued node as the left child and the second dequeued node as the right child.
        3. Enqueue the new node into Target.



Clarifcation notes:

Step 4.1 – do the following procedure twice: compare the probability occurrences of the front nodes of Source and Target. If they are equal or if Source is less, dequeue Source. If the Target is less, dequeue Target.

Step 4.2 – the frst dequeued node and second dequeued node should be left and right children respectively.
Implementation

Overview of Files

CharFreq class, which houses a Character object “character” representing a certain ASCII character, and a double “probOcc” representing its probability of occurrence (value between 0 and 1 showing its frequency). These objects are implemented to compare primarily by probOcc, then by character if those are equal. Note that “character” can be null. Getters and setters are provided. Do not edit this class.

Queue class, which functions as a simple generic queue. It implements isEmpty(), size(), enqueue(), dequeue(), and peek(). Do not edit this class.
TreeNode class, which houses a CharFreq object “data” representing a certain character and its frequency, and TreeNodes “left” and “right” representing the left and right subtrees of the binary tree. Getters and setters are provided. Do not edit this class.
Driver class, which you can run to test any of your methods interactively. Feel free to edit this class, as it is provided only to help you test. It is not submitted and it is not used to grade your code.
StdIn and StdOut, which are used by the driver, provided methods, and some of your implemented methods as well.
Do not edit these classes.

HuffmanCoding class, which contains some provided methods in addition to annotated method signatures for all the methods you are expected to fll in. You will write your solutions in this fle, and it is the fle which will be submitted for grading. It contains instance variables
fleName, sortedCharFreqList, huffmanRoot and encodings, which must be set by your methods.

Multiple text fles which contain input data, and can be read by the driver as test cases. These fles, as well as the fles used for grading are guaranteed to be ASCII only. Feel free to edit them or even make new ones to help test your code. They are not submitted.



HuffmanCoding.java

NOTE: You are allowed (encouraged, even) to make helper methods in your HuffmanCoding.java fle to be used in your graded methods. Just make sure that they are created with the private keyword. Do not add new imports.

Methods provided to you:
    1. writeBitString


This method takes in a fle name and a string consisting of the characters ‘1’ and ‘0’, and writes the string to the fle.
You must use this provided method to write your encoding to your output fle, and must not try to write your string to a fle.
Note that it does not actually write the characters ‘1’ and ‘0’, and actually writes in bits.
The fle name given does not need to exist yet, and if it doesn’t the method will create a new fle. If the fle exists, the method will overwrite it. Do not edit this method.

    2. readBitString

This method takes in a fle name containing an encoded message, and returns a string consisting of the characters ‘1’ and ‘0’.
You must use this provided method to recover your encoded string from your input fle, and must not try to read the encoded fle yourself.
Note that it reads the fle byte by byte and converts the bits back into characters.
The given fle name must exist, and it must have been written to by writeBitString already. Do not edit this method.


Implementation Notes

YOU MAY only update the methods makeSortedList(), makeTree(), makeEncodings(), encode() and decode(). DO NOT add any instance variables to the HuffmanCoding class.

DO NOT add any public methods to the HuffmanCoding class.
DO NOT use System.exit() in your code.
DO NOT remove the package statement
from HuffmanCoding.java
YOU MAY add private methods to the HuffmanCoding class.



Methods to be implemented by you:

    1. makeSortedList

Implement this method to read the fle referenced by flename character by character, and store a sorted ArrayList of CharFreq objects, sorted by frequency, in sortedCharFreqList. Characters that do not appear in the input fle will not appear in your ArrayList.
Notice that your provided code begins by setting the fle with StdIn. You can now use methods

like StdIn.hasNextChar() and StdIn.readChar() which will operate on the fle as if it was standard input.
Also notice that there are only 128 ASCII values. This means that you can keep track of the number of occurrences of each character in an array of size
    128. You can use a char as an array index, and it will automatically convert to the corresponding
ASCII int value. You can convert an ASCII int value “num” back into its corresponding char with (char) num.
The Huffman Coding algorithm does not work when there is only 1 distinct character. For this specifc case, you must add a different character
with probOcc 0 to your ArrayList, so you can build a valid tree and encode properly later. For this assignment, simply add the character with ASCII value one more than the distinct character. If you are already at ASCII value 127, wrap around to ASCII
    0. DO NOT add more than one of these, and also DO NOT add any characters with frequency 0 in any normal input case.
Because the CharFreq object has been implemented to compare based on probOcc primarily, you can simply use Collections.sort(list) before returning your fnal ArrayList. You do not need to implement your own sorting method.
Below is an example of running the driver to help test this method.















    2. makeTree

Implement this method to use sortedCharFreqList and store the root of a valid Huffman Coding tree in huffmanRoot.
You will be using the TreeNode class to represent one node of your Huffman Coding tree. It contains a CharFreq object as its data, and references to the left and right.
You will be using the provided Queue class to code the Huffman Coding process. You must use the Huffman Coding algorithm outlined above.
TreeNodes which do not have any children represent encodings for characters. These nodes of your Huffman Coding tree must contain both a “character” and a “probOcc” in their CharFreq object. TreeNodes which have at least one child do not represent encodings for characters. These nodes of your Huffman Coding tree must contain a null “character”, and their “probOcc” must be the sum of their children. The root TreeNode will have a “probOcc” of 1.0.

Below is an example of running the driver to help test this method.





















    3. makeEncodings

Implement this method to use
huffmanRoot and store a String array of size 128 containing a String of 1’s and 0’s for every character in encodings. Characters not present in the Huffman Coding tree will have their spots in the array left null. Remember that going to the left child of your Huffman Coding tree represents adding a ‘0’ to your encoding for a character, and going to the right child represents adding a ‘1’. The encoding for a character is simply given by the path to get to that node from the root. Remember that only TreeNodes with no children contain a character, and only the paths to these TreeNodes will receive an encoding.
You can convert a Character object to its int ASCII code by casting it with (int).
Below is an example of running the driver to help test this method.










    4. encode

Implement this method to use encodings, an input fle containing some text we want to encode, and an output fle we want to encode into. It will write the compressed encoding of the input fle into the output fle.
Notice that your provided code begins by setting the fle with StdIn. You can now use methods like StdIn.hasNextChar() and StdIn.readChar() which will operate on the fle as if it was standard input. You are to create a String of ones and zeros which represents your encoding of the input text fle using your encodings array. The last line of this method must use the writeBitString method to write this String to the fle in bits. DO NOT try to write to the fle manually.
If the fle was successfully compressed, your output fle will have a signifcantly smaller fle size than the original text fle. Feel free to open your fle explorer and check!
Below is an example of running the driver to help test this method.

















Check out the encoded fle in your fle explorer. It will be signifcantly smaller than the original!



    5. decode

Implement this method to take in your encoded fle and use huffmanRoot, and print out the decoding. If
done correctly, the decoding will be the same as the contents of the text fle used for encoding.

Notice that your provided code begins by setting the output fle with StdOut. You can now use methods like StdOut.println() and StdOut.print() which will operate on the decodings fle as if it was standard output.
You must start your method using the provided readBitString method in order to get the string of ones and zeros from the encoded fle. DO NOT try to read the encoded fle manually.
You must then use your tree and the procedure outlined above to decode the bit string into characters, according to the tree’s encoding scheme. Below is an example of running the driver to help test this method.






















VSCode Extensions
The decoded output will be an exact copy of
yourYou inputcaninstallfle. VSCode extension packs for Java. Take a look at this tutorial. We suggest:

Extension Pack for Java

Project Manager for Java
Debugger for Java

Importing VSCode Project

    1. Download the zip fle from Autolab Attachments.

    2. Unzip the fle by double clicking it.
    3. Open VSCode
Import the folder to a workspace through File > Open Folder

Executing and Debugging
You can run your program through VSCode or you can use the Terminal to compile and execute. We suggest running through VSCode because it will give you the option to debug.

How to debug your code
If you choose the Terminal, from Huffman directory/folder:
to compile: javac -d bin src/huffman/*.java
to execute: java -cp bin huffman.Driver

Before submission

Collaboration policy. Read our collaboration policy here.

Submitting the assignment. Submit

HuffmanCoding.java separately via the web submission system called Autolab. To do this, click the Assignments link from the course website; click the Submit link for that assignment.

Getting help

If anything is unclear, don’t hesitate to drop by offce hours or post a question on Piazza.

Find instructors and head TAs offce hours here

Find tutors offce hours on Canvas -> Tutoring -> RU CATS In addition to offce hours we have the CAVE (Collaborative Academic Versatile Environment), a community space staffed with lab assistants which are undergraduate students further along the CS major to answer questions.



Problem by Ishaan Ivaturi




Connect with Rutgers


Rutgers Home

Rutgers Today

myRutgers

Academic Calendar

Calendar of Events

SAS Events

Explore SAS


Departments & Degree-Granting Programs

Other Instructional Programs
Majors & Minors

Research Programs, Centers, & Institutes

International Programs

Division of Life Sciences

Explore CS


We are Hiring!

Research

News

Events

Resources

More products