Homework #5 Solution

Starting from:

~~$30~~

$24

Home

Introduction

The goal of this assignment is to become familiar with low-level POSIX

threads, multi-threading safety, concurrency guarantees, and networking.

The overall objective is to implement a simple multi-threaded

transactional object store. As you will probably find this somewhat

difficult, to grease the way we have provided you with a design for the server,

as well as binary object files for almost all the modules. This means that you

can build a functioning server without initially facing too much

complexity. In each step of the assignment, you will replace one of our

binary modules with one built from your own source code. If you succeed

in replacing all of our modules, you will have completed your own

version of the server.

It is probably best if you work on the modules in roughly the order

indicated below. Turn in as many modules as you have been able to finish

and have confidence in. Don't submit incomplete modules or modules

that don't function at some level, as these will negatively impact

the ability of the code to be compiled or to pass tests.

### Takeaways

After completing this homework, you should:

* Have a basic understanding of socket programming

* Understand thread execution, locks, and semaphores

* Have an advanced understanding of POSIX threads

* Have some insight into the design of concurrent data structures

* Have enhanced your C programming abilities

## Hints and Tips

* We strongly recommend you check the return codes of all system

calls. This will help you catch errors.

* **BEAT UP YOUR OWN CODE!** Throw lots of concurrent calls at your

data structure libraries to ensure safety.

* Your code should **NEVER** crash. We will be deducting points for

each time your program crashes during grading. Make sure your code

handles invalid usage gracefully.

* You should make use of the macros in `debug.h`. You would never

expect a library to print arbitrary statements as it could interfere

with the program using the library. **FOLLOW THIS CONVENTION!**

`make debug` is your friend.

:scream: **DO NOT** modify any of the header files provided to you in the base code.

These have to remain unmodified so that the modules can interoperate correctly.

We will replace these header files with the original versions during grading.

You are of course welcome to create your own header files that contain anything

you wish.

:nerd: When writing your program, try to comment as much as possible

and stay consistent with your formatting.

## Helpful Resources

### Textbook Readings

You should make sure that you understand the material covered in

chapters **11.4** and **12** of **Computer Systems: A Programmer's

Perspective 3rd Edition** before starting this assignment. These

chapters cover networking and concurrency in great detail and will be

an invaluable resource for this assignment.

### pthread Man Pages

The pthread man pages can be easily accessed through your terminal.

However, [this opengroup.org site](http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread.h.html)

provides a list of all the available functions. The same list is also

available for [semaphores](http://pubs.opengroup.org/onlinepubs/7908799/xsh/semaphore.h.html).

## Getting Started

Fetch and merge the base code for `hw5` as described in `hw0`. You can

find it at this link: https://gitlab02.cs.stonybrook.edu/cse320/hw5.

Remember to use the `--stategy-option=theirs` flag for the `git merge`

command to avoid merge conflicts in the Gitlab CI file.

## The Xacto Server and Protocol: Overview

The "Xacto" server implements a simple transactional object store,

designed for use in a network environment. For the purposes of this

assignment, an object store is essentially a hash map that maps

**keys** to **values**, where both keys and values can be arbitrary data.

There are two basic operations on the store: `PUT` and `GET`.

As you might expect, the `PUT` operation associates a key with a value,

and the `GET` operation retrieves the value associated with a key.

Although values may be null, keys may not. The null value is used as

the default value associated with a key if no `PUT` has been performed

on that key.

A client wishing to make use of the object store first makes a network

connection to the server. Upon connection, a **transaction** is

created by the server. All operations that the client performs during

this session will execute in the scope of that transaction.

The client issues a series of `PUT` and `GET` requests to the server,

receiving a response from each one. Upon completing the series of

requests, the client issues a `COMMIT` request. If all is well, the

transaction **commits** and its effects on the store are made permanent.

Due to interference of transactions being executed concurrently by

other clients, however, it is possible for a transaction to **abort**

rather than committing. The effects of an aborted transaction are

erased from the store and it is as if the transaction had never happened.

A transaction may abort at any time up until it has successfully committed;

in particular the response from any `PUT` or `GET` request might

indicate that the transaction has aborted. When a transaction has

aborted, the server closes the client connection. The client may

then open a new connection and retry the operations in the context of

a new transaction.

The Xacto server architecture is that of a multi-threaded network

server. When the server is started, a **master** thread sets up a

socket on which to listen for connections from clients. When a

connection is accepted, a **client service thread** is started to

handle requests sent by the client over that connection. The client

service thread executes a service loop in which it repeatedly receives

a **request packet** sent by the client, performs the request, and

sends a **reply** packet that indicates the result.

Besides request and reply packets, there are also **data** packets

that are used to send key and value data between the client and

server.

:nerd: One of the basic tenets of network programming is that a

network connection can be broken at any time and the parties using

such a connection must be able to handle this situation. In the

present context, the client's connection to the Xacto server may

be broken at any time, either as a result of explicit action by the

client or for other reasons. When disconnection of the client is

noticed by the client service thread, the client's transaction

is aborted and the client service thread terminates.

### The Base Code

Here is the structure of the base code:

```

hw5

├── include

│ ├── client_registry.h

│ ├── data.h

│ ├── debug.h

│ ├── protocol.h

│ ├── server.h

│ ├── store.h

│ └── transaction.h

├── lib

│ └── xacto.a

├── Makefile

├── src

│ └── main.c

├── tests

│ └── xacto_tests.c

└── util

└── client

```

The base code consists of header files that define module interfaces,

a library `xacto.a` containing binary object code for our

implementations of the modules, and a source code file `main.c` that

contains containing a stub for function `main()`. The `Makefile` is

designed to compile any existing source code files and then link them

against the provided library. The result is that any modules for

which you provide source code will be included in the final

executable, but modules for which no source code is provided will be

pulled in from the library.

The `util` directory contains an executable for a simple test client.

When you run this program it will print a help message that

summarizes the commands. The client program understands command-line

options `-h <hostname`, `-p <port`, and `-q`. The `-p` option is

required; the others are options. The `-q` flag tells the client to

run without prompting; this is useful if you want to run the client

with the standard input redirected from a file.

## Task I: Server Initialization

When the base code is compiled and run, it will print out a message

saying that the server will not function until `main()` is

implemented. This is your first task. The `main()` function will

need to do the following things:

- Obtain the port number to be used by the server from the command-line

arguments. The port number is to be supplied by the required option

`-p <port`.

- Install a `SIGHUP` handler so that clean termination of the server can

be achieved by sending it a `SIGHUP`. Note that you need to use

`sigaction()` rather than `signal()`, as the behavior of the latter is

not well-defined in a multithreaded context.

- Set up the server socket and enter a loop to accept connections

on this socket. For each connection, a thread should be started to

run function `xacto_client_service()`.

These things should be relatively straightforward to accomplish, given the

information presented in class and in the textbook. If you do them properly,

the server should function and accept connections on the specified port,

and you should be able to connect to the server using the test client.

Note that if you build the server using `make debug`, then the binaries

we have supplied will produce a fairly extensive debugging trace of what

they are doing. This, together with the specifications in this document

and in the header files, will likely be invaluable to you in understanding

the desired behavior of the various modules.

## Task II: Send and Receive Functions

The header file `include/protocol.h` defines the format of the packets

used in the Xacto network protocol. The concept of a protocol is an

important one to understand. A protocol creates a standard for

communication so that any program implementing a protocol will be able

to connect and operate with any other program implementing the same

protocol. Any client should work with any server if they both

implement the same protocol correctly. In the Xacto protocol,

clients and servers exchange **packets** with each other. Each packet

has two parts: a fixed-size part that describes the packet, and an

optional **payload** that can carry arbitrary data. The fixed-size

part of a packet always has the same size and format, which is given

by the `xacto_packet` structure; however the payload can be of arbitrary

size. One of the fields in the fixed-size part tells how long the payload is.

- The function `proto_send_packet` is used to send a packet over a

network connection. The `fd` argument is the file descriptor of a

socket over which the packet is to be sent. The `pkt` argument is a

pointer to the fixed-size part of the packet. The `data` argument is a

pointer to the data payload, if there is one, otherwise it is `NULL`.

The `proto_send_packet` assumes that multi-byte fields in the packet

passed to it are in **host byte order**, which is the normal way that

values are stored in variables. However, as byte ordering

(i.e. "endianness") differs between computers, before sending the

packet it is necessary to convert any multi-byte fields to **network

byte order**. This can be done using, e.g., the `htonl()` and related

functions described in the Linux man pages. Once the header has been

converted to network byte order, the `write()` system call is used to

write the header to the "wire" (i.e. the network connection). If the

length field of the header specifies a nonzero payload length, then an

additional `write()` call is used to write the payload data to the

wire.

- The function `proto_recv_packet()` reverses the procedure in order to

receive a packet. It first uses the `read()` system call to read a

fixed-size packet header from the wire. After converting multi-byte fields

in the header from network byte order to host byte order (see the man

page for `ntohl()`), if the length field of the header is nonzero then

an additional `read()` is used to read the payload from the wire. The

header and payload are stored using pointers supplied by the caller.

**NOTE:** Remember that it is always possible for `read()` and `write()`

to read or write fewer bytes than requested. You must check for and

handle these "short count" situations.

Implement these functions in a file `protocol.c`. If you do it

correctly, the server should function as before.

## Task III: Client Registry

You probably noticed the initialization of the `client_registry`

variable in `main()` and the use of the `creg_wait_for_empty()`

function in `terminate()`. The client registry provides a way of

keeping track of the number of client connections that currently exist,

and to allow a "master" thread to forcibly shut down all of the

connections and to await the termination of all server threads

before finally terminating itself.

The functions provided by a client registry are specified in the

`client_registry.h` header file. Provide implementations for these

functions in a file `src/client_registry.c`. Note that these functions

need to be thread-safe, so synchronization will be required. Use a

mutex to protect access to the thread counter data. Use a semaphore

to perform the required blocking in the `creg_wait_for_empty()`

function. To shut down a client connection, use the `shutdown()`

function described in Section 2 of the Linux manual pages.

Implementing the client registry should be a fairly easy warm-up

exercise in concurrent programming. If you do it correctly, the

Xacto server should still shut down cleanly in response to SIGHUP

using your version.

**Note:** You should test your client registry separately from the

server. Create test threads that rapidly call `creg_register()` and

`creg_unregister()` methods concurrently and then check that a call to the

`creg_wait_for_empty()` function blocks until the number of registered

clients reaches zero, and then returns.

## Task IV: Data Module

The file `data.h` describes **blobs**, which represent reference-counted

chunks of arbitrary data, **keys**, which are blobs packaged together

with a hash code so that they can be used as keys in a hash table,

and **versions**, which are used to represent versions of data in the

transactional store.

A possibly unfamiliar (but essential) concept here is the notion of a

**reference count**.

The basic idea of reference counting is for an object to maintain

a field that counts the total number of pointers to it that exist.

Each time a pointer is created (or copied) the reference count is

increased. Each time a pointer is discarded, the reference count

is decreased. When the reference count reaches zero, there are

no outstanding pointers to the object and it can be freed.

The reason we need to use reference counting for various objects

is because pointers to those objects can be stored in other objects,

where they can persist long after the original context in which

they were created has terminated. Eventually, we need to be able to

determine when the last pointer to an object has been discarded,

so that we can free the object and not leak storage.

In a reference counting scheme, the object to which reference counting

is to be applied has to provide two functions: one for increasing

the reference count and another for decreasing it. The function

for decreasing the reference count has the responsibility of

freeing the object when the reference count reaches zero.

In addition, the code that uses a reference-counted object has

to be carefully written so that whenever a pointer is created

the method to increase the reference count is called, and whenever

a pointer is discarded the method to decrease the reference count

is called.

Note that, in a multi-threaded setting, reference counts are shared

between threads and therefore need to be protected by mutexes if

they are to work reliably.

With the above information in mind, the implementation of this module

should be relatively straightforward.

## Task V: Client Service Thread

Next, you should implement the thread function that performs service

for a client. This function is called `xacto_client_service`, and

you should implement it in the `src/server.c` file.

The `xacto_client_service` function is invoked as the thread function

for a thread that is created to service a client connection.

The argument is a pointer to the integer file descriptor to be used

to communicate with the client. Once this file descriptor has been

retrieved, the storage it occupied needs to be freed.

The thread must then become detached, so that it does not have to be

explicitly reaped, and it must register the client file descriptor with

the client registry. Next, a transaction should be created to be used

as the context for carrying out client requests.

Finally, the thread should enter a service loop in which it repeatedly

receives a request packet sent by the client, carries out the request,

and sends a reply packet, followed (in the case of a reply to a `GET` request)

by a data packet that contains the result. If as a result of carrying

out any of the requests, the transaction commits or aborts, then

(after sending the required reply to the current request) the service

loop should end and the client service thread terminate, closing the

client connection.

## Task VI: Transaction Manager

Probably the next easiest module to implement is the transaction manager,

which is responsible for creating, committing, and aborting transactions.

The various functions that have to be implemented are specified in

`transaction.h` header file.

For transactions, the function `trans_ref()` is used to increase

the reference count on a transaction and the function `trans_unref()`

is used to decrease the reference count. The reference count itself is

maintained in the `refcnt` field of the `transaction` structure.

Because it is accessed concurrently, it needs to be protected by a mutex.

When transaction is created by the client service thread, it initially

has a reference count of one.

The transaction gets passed to various other functions, including operations

on the store. If the store needs to save a reference to a transaction

(*e.g.* to record the transaction as the "creator" of a version of

some data), then it will increase the reference count before doing so

to account for the saved pointer.

Ultimately, the transaction will be committed via a call to

`trans_commit` or aborted via a call to `trans_abort`.

These two calls consume a reference to the transaction

(*i.e.* they decrement the reference count), so the caller should not

continue to use the transaction subsequently unless an additional

reference was previously created.

A transaction is only ever committed by a call to `trans_commit` made

by the client service thread in response to a `COMMIT` request received

from the client. Once `trans_commit` has been called, it could be

that the last reference to the transaction has been consumed and

the transaction has been freed, so the transaction object should not

referenced again after that (unless another reference has previously

been obtained).

Calls to operations on the store may cause a transaction to abort,

but in that case they arrange to increase the reference count before doing

so, so that the net value of the reference count will not be decreased

and client service thread will be able to check the status of a

transaction after performing a store operation to see whether or not

the transaction has aborted.

One other complication in the implementation of transactions is

the notion of "dependency". As a result of performing an operation on

the store, a transaction may become **dependent** on one or more other

transactions. Before a transaction may commit, it must wait for all

the transactions on which it depends (*i.e.* the transactions in its

**dependency set**) to commit. If any of the transactions in the dependency

set aborts, then the transaction itself must abort.

In order to manage dependencies, the transaction manager module provides

a function `trans_add_dependency`, which is used to add one transaction

(the "dependee") to the dependency set of another (the "dependent").

Adding a dependency is done by allocating a "dependency" structure,

storing a reference to the dependee transaction in that structure,

and adding the structure to the linked list of dependencies for the dependent

transaction, after checking to make sure that there is not already an entry

for the same dependee in the dependency list. The reference count of the

dependee transaction must increased to account for the stored pointer.

When a transaction attempts to commit, it must first call `sem_wait`

on the semaphore in the `sem` field of each of the transactions in its

dependency set. This will block the transaction until the transaction

on which it depends has committed and done `sem_post` on that semaphore.

Since a transaction can exist in the dependency set of more than one

other transaction, in order to know how many times to call `sem_post`

it has to rely on the transaction calling `sem_wait` to have first

incremented the `waitcnt` field on the transaction for which it is

waiting, so that `waitcnt` always correctly reflects the number of

transactions waiting on `sem_wait`. Of course, the `waitcnt` field

of a transaction must only be accessed while the mutex lock on that

transaction is held.

Once `sem_wait` has been called on all transactions in the dependency

set, a transaction can check the status of these transactions to see

if any have aborted. If so, then committing is not possible, and the

dependent transaction must abort instead. If all of the dependee

transactions have committed, then the dependent transaction can move

to the "committed" state.

## Task VII: Transactional Store

If you have made it this far, then you can try to implement the

transactional store module itself. This is probably the most technical

of the various modules, so you probably need to have a good understanding

of what is going on before you attempt the implementation.

The functions for the store are specified in the `store.h` header

file, where some further details of how the store works are given.

Besides the fairly straightforward initialization and finalization

functions, the store provides functions `store_put` and `store_get`,

which in the end are rather similar to each other.

Each function uses the hash code of the key passed as argument

to identify a particular "bucket" in the hash table.

The bucket is then searched to find a map entry whose key is equal

to the given key. If an entry is not found, one is created and

inserted into the proper bucket.

Once the map entry for the given key has been located (or created),

then both `store_put` and `store_get` work by inserting a new

"version" into the list of versions contained in that map entry.

The concept of a version, and the rules by which version lists

are maintained, are detailed in `store.h`.

As a result of these rules, it is possible for a transaction calling

`store_put` or `store_get` to either abort or become dependent

on other transactions. The implementations of these functions have

to arrange for these actions to be carried out when necessary.

Note that the only time versions are removed from the transactional

store is during the "garbage collection" phase carried out at the

beginning of a `store_put` or `store_get` operation.

Upon completion of an operation, there will in general be "extra"

versions in the store that will get cleaned up on the next call

to `store_put` or `store_get`. You will want to be careful to

manage the reference counts on the various objects so that you

don't end up with any blobs or transactions that hang around

forever, even when they are not reachable from anything in the store,

and that you don't attempt to decrease any reference count below

zero. If you do this all correctly, then when the server terminates

(as a result of `SIGHUP`), all objects will be cleanly freed.

## Submission Instructions

Make sure your hw5 directory looks similarly to the way it did

initially and that your homework compiles (be sure to try compiling

both with and without "debug").

Note that you should omit any source files for modules that you did not

complete, and that you might have some source and header files in addition

to those shown. You are also, of course, encouraged to create Criterion

tests for your code.

The test file we have provided contains some code to start a server and

attempt to connect to it. It will probably be useful while you are

working on `main.c`.

It would definitely be a good idea to use `valgrind` to check your program

for memory and file descriptor leaks. Keeping track of allocated objects

and making sure to free them is one of the more challenging aspects of this

assignment.

To submit, run `git submit hw5`.