Starting from:
$34.99

$28.99

INTRODUCTION TO AI HOMEWORK 6 Solution

OVERVIEW

In this assignment we will explore decision making methods and different machine

learning techniques. 

 

 

EXERCISE 1 [UTILITY THEORY - 3 PTS]

Suppose you are looking to buy a used car. You can either go to the dealer immediately and make a decision, or you can first pay a website for more information about the quality of each car. If you don’t pay the website, you’ll buy a good quality used car with probability

0.5 and a poor quality car with probability 0.5. If you pay for information, however, you’ll buy a good quality car with probability 0.8 and a poor quality car with probability 0.2. A good quality car will cost $1000 dollars in maintenance over the next year, but a poor quality car will cost $5000 in maintenance. How much money is the information worth in expectation?

 

 

EXERCISE 2 [DECISION TREE - 28 PTS]

A company is trying to learn the set of conditions under which its new off-road vehicle can operate reliably. In certain conditions, the engine has a tendency to sputter and struggle, but in other conditions it operates well. The company has collected the following data:

Slope      Terrain   Temperature         Humidity               Reliable?

Steep      Mud       Hot         High       No

Steep      Mud       Hot         Low        No

Steep      Dirt         Hot         Low        No

Steep   Dirt      Cold    Low     Yes

Steep   Grass   Hot      Low     No

Steep   Grass   Cold    Low     Yes

Gentle             Mud    Cold    High    No

Gentle             Mud    Cold    Low     No

Gentle             Dirt      Hot      Low     Yes

Gentle             Dirt      Hot      High    No

Gentle             Grass   Cold    Low     Yes

Gentle             Grass   Hot      Low     Yes

 

 

1.   (2 pts) Given the categories in the table, how many distinct environments are

possible? (An environment is a tuple of slope, terrain, temperature, and humidity).

2.   (8 pts) Calculate the information gain on engine reliability of initially splitting on

each of slope, terrain, temperature, and humidity. Show your calculations. Preserve

at least 2 decimal places of accuracy.

3.   (10 pts) Construct and draw the entire decision tree for this problem, using

maximum information gain to choose the decision nodes at each level. If two potential nodes are tied in their information gain score, break the tie by the order of the attributes in the table, left to right. This tree should perfectly classify engine reliability.

4.   (6 pts) Draw a different tree with 4 decision nodes that perfectly classifies engine

reliability (hint: split on temperature first. No need to use information gain here). Why didn’t we find this smaller tree in the previous problem?

5.   (2 pts) Suppose someone gives you a pre-made decision tree T for this problem, and that this tree mis-classifies exactly 4 environments out of all possible environments. Unaware of this, you decide to test the accuracy of T by selecting a subset of 12

distinct environments from this domain uniformly at random and classifying them with T.  What is the probability that T will perfectly classify all of the selected examples? 

 

 

EXERCISE 3 [NEURAL NET - 8 PTS]

1.   (3 pts) For the below single-layer perceptron network, find weights (��# , ��% , ��& ) that

implement the function (A AND B). That is, Z will be activated if and only if the

expression (A AND B) is true for the values of boolean inputs A and B. When

specifying weights, use integers for ��% and ��& , and ��# should be integers plus 0.5

(e.g., …, -1.5, -0.5, 0.5, 1.5, … ). Assume we are using ReLU as the activation function

g(x). 

 

2.   (5 pts) The image below depicts a simple neural network with three input nodes,

one output node, and no hidden nodes. We have w1 = 0.7, w2 = 0.3, w3 = 0.9. The

activation of the output node Z is equal to 𝑋1    ∗ 𝑤1    + 𝑋2 ∗ 𝑤2    +  𝑋3    ∗ 𝑤3    (no

bias term and no additional activation function like sigmoid or ReLU). Suppose that

our input is X1 = 2, X2 = -1, X3 = 0, and that the gradient of the evaluation function with respect to Z is 2. Calculate the updated weights of w1, w2, and w3 after one round of backpropagation with learning rate 0.05.

 

 

 

EXERCISE 4 [POMDPS AND MDPS - 25 PTS]

 

 

 



 

 

 

b                  c                  d 

 

 

 

Consider the simple 4-state environment above. You are not sure exactly where you are,

but your belief state is < p(a), p(b), p(c), p(d) =  < 0.5, 0.125, 0.25, 0.125  . In any state

you can attempt to go north, south, east, or west. You have a noisy sensor that allows you to

detect how many walls are adjacent to you, with a probability of making a correct observation being 0.8. For example, if you are in state a, with 0.8 probability you will receive a percept saying there are 3 adjacent walls (correct observation). Unfortunately, your actuator is not working perfectly. For any action, there is a

●    0.7 chance that you move in the given direction.

●    0.1 chance that you move in a 90 degrees clockwise direction instead

●    0.1 chance that you move in a 90 degrees counterclockwise direction instead

●    0.05 chance that you move in the opposite direction

●    0.05 chance that you don’t move at all

1.   (2 pts) For each state a, b, c, d, list the possible resulting states of going north

starting in that state.

2.   (8 pts) Suppose you take a single action, “north,” and receive a percept saying there

are 3 adjacent walls. Calculate the resulting belief state. Show all work.

 

Coding MDPs (15 pts)

 

Implement value iteration algorithm in mdp.py that takes an environment file as an

argument and prints out the utilities of each state, accurate to at least 3 decimal places. Use

the value iteration algorithm to calculate the utilities (Figure 17.4 in the book).

 

 

 

The available actions in any environment are moving north, south, east, and west. Any action can be taken from any non-terminal state, but hitting a wall causes you to remain in the same place. Moving into a terminal state ends a run.

 

Below is an example 4-by-3 environment (Figure 17.1 in the book). The agent starts from cell (1,1). Cell (2,2), colored in gray, represents the wall. The agent stops interacting with the environment when it reaches one of the goal states (cell (4,3) and cell (4,2)), marked +1 or -1. In this environment, the “intended” outcome occurs with probability 0.8, but with probability 0.1 the agent moves 90 degrees clockwise,  and with probability 0.1 the agent

moves 90 degrees counterclockwise of the “intended” direction. 

 

The format of environment files is as follows: the first line specifies the discount factor

gamma. The second line specifies the “cost of living” (a negative reward) for all non- terminal states. The third line specifies the transition probabilities in the following order: going in the intended direction, going 90 degrees clockwise, going the opposite direction as intended, and going 90 degrees counterclockwise of intended direction (there is no possibility of being stationary here). The rest of the file contains the grid (with no trailing newlines). Star ‘*’ represents a state, ‘x’ represents a wall, and numbers represent the values of terminal states. Columns are separated by spaces, and rows are separated by newlines. All grids are rectangular. 

 

An example environment file (which describes the above example environment):

 

1

-0.04

0.8 0.1 0 0.1

* * * 1

* x * -1

* * * *

 

When the command python firstname_lastname_mdp.py testfile.txt is run, your program should print out a grid in the same format as the input grid, but with the stars replaced by utilities. The output from the above file (see Figure 17.3 in the book) should be

 

0.812 0.868 0.918 1

0.762 x 0.660 -1

0.705 0.655 0.611 0.388

 

We will run your code on several test files to ensure accuracy. Rename your mdp.py to

firstname_lastname_mdp.py before you submit on Canvas!

EXERCISE 5 [CAR RISK FACTOR CLASSIFICATION - 18 PTS]

In this exercise, we will study the performance of a decision tree classifier on a dataset about 1985 Auto Imports. The dataset consists of three types of entities: (a) the specification of a car in terms of various characteristics, e.g. make, fuel-type, etc., (b) its assigned insurance risk rating, (c) its normalized losses in use as compared to other cars. Please read info.txt for the details of this dataset.    

 

We will try to predict whether a car is risky or not using the specification of the car. We will treat -the specification attributes of a car as a feature vector, and treat the risk factor as a label. The given risk factor in the dataset is between [-3, 3]. In util.py, we convert the

risk factor to a binary label by using a threshold thrs = 0.If y[i] thrs, we consider it as

a positive sample (i.e., it is risky); otherwise, we treat it as a negative sample (i.e., it is not risky). The problem is then converted to a binary classification problem. If any attribute in the specification is missing (replaced by ‘?’ in the dataset), we reject the sample. The data preprocessing function is provided for you in util.py. 

 

Both sklearn and pandas libraries are installed on CAEN. Run module load python when you first log in a CAEN machine. Rename your decision_tree.py to firstname_lastname_decision_tree.py before you submit on Canvas!

 

1.   (8 pts) Implement decision_tree(X_train, y_train, X_test, y_test, max_depth)in  decision_tree.py based on the specification provided in the skeleton code. You can create any helper function as you see fit. You should use the sklearn.tree.DecisionTreeClassifier class. Set criterion= ‘entropy’ to use information gain to select the best split. Set random_state=1 to control the randomness for reproducibility. For a decision tree classifier, the max_depth is a hyper-parameter determines the maximum

number of attributes the model is going to use for each prediction (up to the number

of available feature in the dataset). Here we use max_depth=20 as our default

max_depth.

 

What is the accuracy of your decision tree classifier on the Auto dataset? To measure this, we repeat the training and testing procedure for 50 times, and take the average accuracy across these 50 iterations. For each iteration, use one generated train/test split from function q1_train_test_split(X,y), train your decision tree classifier using the training data, and test it on the testing data. Report your average accuracy.

 

2.   (10 pts)[Hyper-parameter Selection] 

In machine learning, the model parameters are variables that get adjusted by training with existing data, and hyperparameter are the variables about the training process itself. Different models may have different hyper-parameters, and some simple models might have none hyper-parameters. For a decision tree classifier, its hyper-parameters include

 

●    max_depth: The maximum depth of the tree.

●    max_features: The number of features to consider when looking for the best

split.

●    min_samples_leaf: The minimum number of samples (training points)

required to be at a leaf node.  

●    ...

Now we want you to find the best max_depth for this decision tree. We use a 5- fold cross-validation to select the best max_depth.Implement function cross_val(X_train, y_train).For each of  max_depth =

{1,2,3,4, … ,len(feature)},randomly partition the training data into 5

equal sized subsets (we do this for you using sklearn.model_selection.KFold). Set random_state=1 to control the randomness for reproducibility. Train and validate on these subsets by cross validation to find the average accuracy for each max_depth. For cross validation, treat one subset as the validation data, and the other 4 subsets as the training data. Repeat this until all 5 subsets have been the validation data. Average the results from the 5 folds to produce a single estimation. Note that you cannot use function sklearn.model_selection.cross_val_score.

 

Which max_depth value appears to work the best? Report the best max_depth you find for this problem. Analyze the trend you observe in the average accuracy as you vary max_depth,and briefly explain why do you think there is such a trend.

 

 

EXERCISE 6 [LINEAR REGRESSION AND GRADIENT DESCENT - 18 PTS]

In machine learning, regression is the problem of learning a model to approximate an unknown function. In order to learn a model, we must have training data, consisting of points and a known function . More precisely, suppose we have    pairs of -dimensional inputs and their corresponding outputs. Let be a matrix whose columns are the data vectors                         , and let be an vector whose entries correspond to the true function outputs of the data vectors. A simple linear model for this scenario is

 

where     is a  vector of model weights, and T is the transpose operator.

 

Our goal is to learn a best-fit set of weights     for given      and    , so that . To

measure fitness, we use the following regularized loss function:

 

 

 

 

Note that          is the Euclidean norm. Our objective is to find a     such that this loss function

is minimized.

 

The gradient of the loss function with respect to     is

 

Setting this equal to the zero vector, the closed-form solution for the optimal     can be

obtained:

 

where I is the identity matrix.

 

Rename your regression.py to firstname_lastname_regression.py before

you submit on Canvas!

 

1.   (4 pts) Implement the function exact_solution in the provided python file regression.py (which you should of course rename in the usual way). The function takes numpy matrices and and returns the best fit    . Use built-in numpy functions to perform the matrix calculations. (You can use numpy.linalg.inv for matrix inverses, numpy.eye for identity matrices, and numpy.dot for ordinary matrix multiplication, as well as other numpy functions).

2.   (6 pts) Implement the function gradient_descent. Update     step-by-step until

convergence, with learning rate
. Use
 as the
stopping criterion. The update rule is
 
. You can test your
implementation against your exact solution by typing

                        python regression.py 

3.   (8 pts) You are provided with online_shares.csv, which contains numerical attributes (features) of various online articles and how often each was shared. The task is to learn a model to predict the number of shares of an article given the features. Run your your exact solution function on the provided online_shares dataset by typing  

                        python regression.py dataset/online_shares.csv.  Report the root mean squared error (rmse) and standard deviation. Does the linear model fit this data well? Why or why not? Provide a table of the learned weights       associated with each feature. Which 5 features were most important to the model, according to the weights?

More products