Programming User-User Collaborative Filtering Solution

Starting from:

~~$29.99~~

$23.99

Home

In this assignment, you will implement a user-user collaborative filter for LensKit.

LensKit provides a flexible implementation of user-user collaborative filtering, but for this assignment we would like you to implement it (mostly) from scratch.

Specifically, we’re going to have you build a model-free user-user collaborative filtering scorer that predicts a target user’s movie rating for a target item by going through the following process:

1. First, you will adjust each user’s rating vector by subtracting that user’s mean rating from each of their ratings (this corrects for the fact that some users think 5 stars is anything worth seeing and others think 3 stars is very good).

2. Next, you will identify the set of other users who have rated the target item and who have a history of rating items similarly to the target user; specifically, we’ll limit this set to the 30 users with the highest cosine similarity between their adjusted rating vectors and the target user's adjusted rating vector. This similarity measures the angle between the vectors, which is highest when both users have the rated the same items and have given those items the same rating (this won’t be perfectly the case here, since we’re predicting for unrated items).

3. Then you will combine the mean-adjusted ratings from these “neighbor” users, weighted by their cosine similarity with the target user — i.e., the more similar the other user’s ratings, the more their rating of the target item influences the prediction for the target user.

4. Finally, re-adjust the prediction back the target user’s original rating scale by adding the target user’s mean rating back into the prediction.

Once you've written code to do this, the program will give either specific predictions or predictions for top-10 recommended unrated items for the selected users; you do not have to code this part as it's already built into LensKit.

Start by downloading the project template. This is a Gradle project; you can import it into your IDE directly (IntelliJ users can open the build.gradle file as a project). This contains a `SimpleUserUserItemScorer` class that you need to finish implementing, along with the Gradle files to build and run it.

## Downloads and Resources

- Project template (on course website)

- [LensKit for Learning website](http://mooc.lenskit.org) (links to relevant documentation and the LensKit tutorial video)

Additionally, you will need:

- [Java](http://java.oracle.com) — download the Java 8 JDK. On Linux, install the OpenJDK 'devel' package (you will need the devel package to have the compiler).

- A development environment

## Basic Requirements

Implement scoring in this class as follows:

-   Use user-user collaborative filtering.

-   Compute user similarities by taking the cosine between the users' mean-centered rating vectors (that is, subtract each user's mean rating from their rating vector, and compute the cosine between those two vectors). LensKit's `Vectors` class can help you with this; it provides functions to compute dot products, euclidian norms, and means.

-   For each item's score, use the 30 most similar users who have rated the item and whose

    similarity to the target user is positive.

-   Refuse to score items if there are not at least 2 neighbors to contribute to the item's score.

-   Use mean-centering to normalize ratings for scoring. That is, compute the

    weighted average of each neighbor $v$'s offset from average ($r_{v,i} -

    \mu_v$), then add the user's average rating $\mu_u$. Like this, where

    $N(u;i)$ is the neighbors of $u$ who have rated $i$ and $cos(u,v)$ is the

    cosine similarity between the rating vectors for users $u$ and $v$:

    $$p_{u,i} = \mu_u + \frac{\sum_{v \in N(u;i)} cos(u,v) (r_{v,i} - \mu_v)}{\sum_{v \in N(u;i)} |cos(u,v)|}$$

-   Remember, cosine similarity is defined as follows:

    $$cos(u,v) = \frac{\vec u \cdot \vec v}{\|\vec u\|_2 \|\vec v\|_2} = \frac{\sum_i u_i v_i}{\sqrt{\sum_i u^2_i} \sqrt{\sum_i v^2_i}}$$

-   Do not use any code or classes from the `lenskit-knn` module; we want you to code this yourself.

## Running the Recommender

You can run the recommender by using Gradle; the `predict` target will generate predictions for the

user specified with `userId` and the items specified with `itemIds` (see next section for examples).

`recommend` will produce top-10 recommendations for a user.

User-user CF has an interesting penchant for recommending really obscure things. We've also provided a configuration for a hybrid recommender that blends the collaborative filtering output with popularity information to prefer more popular items. To run this version of your recommender, use `recommendBlended`.

All recommender-running tasks will send debug output to a log file under `build`.

## Example Output

Command:

    ./gradlew predict -PuserId=320 -PitemIds=260,153,527,588

Output:

```

predictions for user 320:

153 (Batman Forever (1995)): 2.841

260 (Star Wars: Episode IV - A New Hope (1977)): 4.549

527 (Schindler's List (1993)): 4.319

588 (Aladdin (1992)): 3.554

```

Command:

    ./gradlew recommend -PuserId=320

Output:

```

recommendations for user 320:

858 (Godfather, The (1972)): 4.562

2360 (Celebration, The (Festen) (1998)): 4.556

318 (Shawshank Redemption, The (1994)): 4.556

8638 (Before Sunset (2004)): 4.512

7371 (Dogville (2003)): 4.511

922 (Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)): 4.503

1217 (Ran (1985)): 4.497

44555 (Lives of Others, The (Das leben der Anderen) (2006)): 4.491

2859 (Stop Making Sense (1984)): 4.486

1089 (Reservoir Dogs (1992)): 4.479

```

Command:

    ./gradlew recommendBlended -PuserId=320

Output:

```

recommendations for user 320:

318 (Shawshank Redemption, The (1994)): 0.999

858 (Godfather, The (1972)): 0.999

58559 (Dark Knight, The (2008)): 0.995

1089 (Reservoir Dogs (1992)): 0.995

7153 (Lord of the Rings: The Return of the King, The (2003)): 0.994

1258 (Shining, The (1980)): 0.989

1210 (Star Wars: Episode VI - Return of the Jedi (1983)): 0.989

79132 (Inception (2010)): 0.988

1080 (Monty Python's Life of Brian (1979)): 0.986

4973 (Amelie (Fabuleux destin d'Am?lie Poulain, Le) (2001)): 0.982

```

## Submitting

As with the Course 1 assignments, you will submit a compiled `jar` file to be graded.

To create this file, please use the pre-created archive functionality in the Gradle build:

    ./gradlew prepareSubmission

This will ensure that your submission contains all required files. It will produce a submission file in `build/distributions`.

Submit the `uu-submission.jar` file to Coursera for greading.

## Grading

Your grade will be based on your output over randomly selected users:

- 75% for the scorer ordering items correctly

- 25% for computing the correct scores (within an error threshold)

## Further Exploration

Try different similarity functions and normalization strategies to see what difference they make in user predictions.