$24
In this question you will explore modeling discrete probability distributions and their entropy. Entropy provides a measure of the uncertainty in a probability distribution. You are not expected to have any prior knowledge of entropy. In this question you may use the java utility function Arrays.toString, and may wish to also use a sorting function such as Arrays.sort. You may need to look at the JDK documentation for these. You are advised to read all question parts before beginning. Note that this question assumes that input strings contain only lowercase letters a-‐z throughout. Create a public class Entropy.
In the class Entropy, implement the public static method: int[] charCount(String s)
that returns an int array containing the frequencies of each character occurring in the String s. The length of the returned array should correspond to the number of distinct characters in s. It should return null if s is empty or null. Assume the string contains only lowercase letters. Return the counts, sorted in alphabetical order of the corresponding characters. (The correct counts in incorrect order will receive partial credit).
Expected behavior: charCount("abbc") should return { 1, 2, 1 }. charCount("xxxa") should return { 1, 3 }.
In the class Entropy, implement the public static method: double[] normalize(int[] c)
that returns a normalized array of double probabilities corresponding to the specified int array of counts c. If the ith element of the input array is denoted ci, then return the array P where
That is, divide each element in the input array by the sum of the input elements, and return the result. It should return null if the input is null or empty.
Expected behavior: normalize( new int[] {2, 1, 1} ) should return
{ .5, 0.25, 0.25 }.
You can verify that with these routines, applying normalize to the output of charCount will estimate the probability of each symbol in the input string.
Expected behavior: normalize(charCount("abbc")) should return { 0.25, 0.5, 0.25 } .
In the class Entropy, implement the public static method: double entropyOf(double[] p)
The entropyOf function should compute the entropy of the double array p, that is assumed to represent a probability vector. If pi represents the ith element of the input array p, then it should compute:
That is, it should return the negative sum of each input element multiplied by its (base e) logarithm. Assume that 0 < pi < 1 for all i, so there are no numerical issues.
Expected behavior: entropyOf(new double [] {0.5, 0.25, 0.25}) should compute -0.5log0.5 – 0.25log0.25 – 0.25log0.25, and return 1.0397.
In the class Entropy, implement the public static method: int[][] charCountArray(String[] a)
this method should count the frequencies of characters in each string in array a and return an int array of arrays containing the counts in each input string. The returned array of arrays should have as many rows as there are strings in the input array. Each row will contain a character frequency count for the corresponding input string. Unlike charCount, charCountArray should only count those characters that exist in exactly one input string. That is, each row returned should contain the counts of characters that are unique to the corresponding input string. Any distinct character present in more than one input String should be excluded from the counts of all input strings. The input strings should again be assumed to contain lowercase letters, and the resulting counts should again be sorted in alphabetical order of corresponding characters. Assume there is at least one unique character in each input.
Expected behavior: charCountArray(new String[] {"abbcccxx","bbccyzdd"}) should return { {1, 2}, {2, 1, 1} } because: 'b' and 'c' occur in multiple inputs so are excluded from counting, 'a' and 'x' uniquely occur in the first input with frequency 1 and 2 and 'd', 'y' and 'z' uniquely occur in the second input with frequency 2, 1 and 1.
In the class Entropy, write a main method. The main method should assume that two strings are presented as command line arguments, and print on five lines:
The character probabilities corresponding to the 1st argument. Use charCount and normalize, and use Arrays.toString to convert the result to a formatted String for printing.
The entropy of character probabilities in the 1st and 2nd argument.
The entropy of unique characters in the 1st and 2nd argument (use charCountArray, normalize and entropyOf). Use at least 3 decimal places.
Expected behavior: The output should use formatting shown in the example below.
Invoking java Entropy hello world should produce:
Character Probabilities in hello : [0.2, 0.2, 0.4, 0.2]
Entropy of hello : 1.332
Entropy of world : 1.609
Entropy of unique chars in hello : 0.693
Entropy of unique chars in world : 1.099
The file you must submit for this assignment is: Entropy.java (do not submit the .class file). P Late submissions can still be accepted until 5:00 PM, May 15th (Tuesday). However, there will be a 20% penalty.