Starting from:
$35

$29

Assignment 3 A Trivial UNIX File System Solution


Objective

This programming assignment is intended to give you experience in writing a simple UNIX-like file system.

The UNIX-like File System

A file system stores data files persistently on disk, and provides a unified way to access and retrieve these files. It keeps a fair amount of information about each file in a data structure which is essential to isolate and identify data. In this assignment, you will write a program in C or C++ (whichever you prefer) which simulates a trivial file system. This program mounts a file system that resides on a virtual disk (i.e., a file acting as the disk) and checks if it is consistent. It subsequently reads commands from an input file and executes them by simulating the file system operations.

Disk Layout

The file system you will develop in this assignment resides on a disk that is 128 KB in size. The size of each block is 1 KB (1024 bytes). The first block is called the superblock and contains the list of free blocks (free-space list) and 126 index nodes (inodes). The remaining 127 blocks store data blocks of the files on your file system. A file consists of one or multiple blocks, hence its size can only be multiples of a kilobyte. Users can create or delete files and directories, and arbitrarily change the size of a file provided that the disk or the inodes are not full.

Superblock



free-space list
inode[0]
inode[1]

inode[125]
#











Block #
0
1
2
3



124
125
126
127



Data blocks


The free-space list is implemented as a bit vector occupying the first 128 bits (16 bytes) in the superblock. Each block is represented by a bit which indicates whether the corresponding block is free or not; 0 indicates that the block is free, and 1 indicates that the block is not available to use. The first bit represents the state of the first block, i.e., the superblock, and the last bit represents the state of the last data block.

Each inode contains information about a file or a directory. In this file system, an inode contains six pieces of information, namely the name of the file (or directory), the size of the file, the index of the first block allocated to the file, the state of the inode, the type of the inode, and the index of the inode of the parent directory. The inode structure is defined below:

typedef struct { char name[5]; uint8_t used_size; uint8_t start_block; uint8_t dir_parent;

} Inode;


    • Name of the file/directory, not null terminated

    • Inode's state and size of the file/directory

    • Index of the start file block

    • Inode's type and index of the parent inode

1
We describe the file attributes stored in this structure below:

    • name: represents the name of the file or directory, limited to 5 characters only. Hence, it is possible to have a name like "image", which leaves no space for the '\0', although filenames under 5 characters must end with a null terminator '\0'. Moreover, the name of the file must never start or end with a white space, and all leading and ending spaces must be ignored (for example, " docs " should be changed to "docs" which is a valid name).

    • used size: represents the state of the inode followed by the size of the file. The first bit of this variable indicates the state of the inode, where 1 means the current inode is in use and 0 means it is free. The other 7 bits represent the file size (in blocks). If the inode pertains to a directory, the size would be zero.


the inode’s state




#







7!this inode is in use and occupies 26 contiguous blocks.
1
0
0
1
1
0
1
0


"







the file size


    • start block: represents the index of the first block on the disk allocated to the file. In this file system, each file must be allocated a number of contiguous blocks. If the inode pertains to a directory, then this variable would be zero.

    • dir parent: represents two pieces of information, the inode’s type and parent. The first bit of this variable indicates the type of the inode, where 1 means that it pertains to a directory and 0 means that it pertains to a file. The other 7 bits indicate the index of the parent inode in the superblock (which can be between 0 and 125 inclusive). The index is set to 127 for files and directories located in the root directory. Note that the index cannot be set to 126.


the inode’s type




#







7!this inode pertains to a file located in the root directory.
0
1
1
1
1
1
1
1


"




Index of the parent inode


Simulating Basic Operations

In this assignment, you will simulate a number of basic operations on your file system. Your program must load the superblock into the memory without loading the file blocks, unless it is necessary. You will have to write the superblock and the file blocks back to the file that emulates the disk immediately after running a command.

Your program must read commands from a file and perform the corresponding file system operations (described in the next section). If an undefined command is detected in the input file, it should print an error message to stderr with the name of the input file and the line number where the error occurred (see the example below), then continue running the command in the next line. The format of the error message is:

Command Error: <input file name>, <line number>

For example

Command Error: command_file, 10

suggests that the command in Line 10 of the input file command file is not recognized or cannot be parsed. More specifically, this error message must be printed in the following cases: invalid command, wrong number of


2
arguments passed to the command, unexpected arguments (e.g., when the length of the filename is greater than 5), and invalid index of the block number (when it is outside [1, 127]).

The file system will use a buffer of 1 KB (1024 bytes) to hold data that is read from or must be written to a block within a file. This buffer needs to be initialized to zero when your program starts. The program must also keep track of the current working directory.

The commands that may appear in the input file are as follows, noting that all filenames must be found in the current working directory:

    • M - Mount the file system residing on the disk (results in the invocation of fs mount) Usage: M <disk name>
Description: Mounts the file system for the given virtual disk (file) and sets the current working directory to root.

    • C - Create file (results in the invocation of fs create) Usage: C <file name> <file size>
Description: Creates a new file with the provided name in the current working directory.

    • D - Delete file (results in the invocation of fs delete) Usage: D <file name>
Description: Deletes the specified file from the current working directory.

    • R - Read file (results in the invocation of fs read) Usage: R <file name> <block number>
Description: Reads the block number-th block from the specified file into the buffer.

    • W - Write file (results in the invocation of fs write) Usage: W <file name> <block number>
Description: Writes the data in the buffer to the block number-th block of the specified file.

    • B - Update buffer (results in the invocation of fs buff) Usage: B <new buffer characters>
Description: Updates the buffer with the provided characters. Up to 1024 characters can be provided. If fewer characters are provided, the remaining bytes (at the end of the buffer) are set to 0.

    • L - List files (results in the invocation of fs ls) Usage: L
Description: Lists all the files/directories in the current working directory, including . and ..

    • E - Change files size (results in the invocation of fs resize) Usage: E <file name> <new size>
Description: Changes the size of the given file. If the file size is reduced, the extra blocks must be deleted (zeroed out).

    • O - Defragment the disk (results in the invocation of fs defrag) Usage: O
Description: Defragments the disk, moving used blocks toward the superblock while maintaining the file data. As a result of performing defragmentation, contiguous free blocks can be created.

    • Y - Change the current working directory (results in the invocation of fs cd) Usage: Y <directory name>
Description: Updates the current working directory to the provided directory. This new directory can be either a subdirectory in the current working directory or the parent of the current working directory.






3
File System Implementation

You will implement the following functions to handle the commands described in the previous section. Errors must be checked and reported in the same order as they are listed for each function.
    • void fs_mount(char *name)

Mounts the file system residing on the virtual disk with the specified name. The mounting process involves loading the superblock of the file system, but before doing this, you should check if there exists a file (i.e., a virtual disk) with the given name in the current working directory. Print the following error to stderr if this file does not exist:

Error: Cannot find disk <disk name>

The next step is to check the consistency of the file system. We say a file system is inconsistent when an arbitrary number of bits in its superblock are changed accidentally. Your program should read through the file system and perform the following consistency checks:

        1. Blocks that are marked free in the free-space list cannot be allocated to any file. Similarly, blocks marked in use in the free-space list must be allocated to exactly one file.

        2. The name of every file/directory must be unique in each directory.

        3. If the state of an inode is free, all bits in this inode must be zero. Otherwise, the name attribute stored in the inode must have at least one bit that is not zero.

        4. The start block of every inode that is marked as a file must have a value between 1 and 127 inclusive.

        5. The size and start block of an inode that is marked as a directory must be zero.

        6. For every inode, the index of its parent inode cannot be 126. Moreover, if the index of the parent inode is between 0 and 125 inclusive, then the parent inode must be in use and marked as a directory.

If the file system is inconsistent, you must print the following error to stderr:

Error: File system in <disk name> is inconsistent (error code: <number>)

where <number> identifies the type of inconsistency (as listed above) and can be from 1 to 6. Note that you must check the file system inconsistency in the same order as above, and only report the first inconsistency you detect. Hence, if a file system is inconsistent for several different reasons, you only print the smallest error code.

Your program should not mount a file system that fails the consistency check. In this case, you should continue to use the last file system that was successfully mounted for running the next commands. If no file system was successfully mounted before, you should print the following error to stderr for each command that comes after the mount command until you read another mount command that can mount a file system successfully:

Error: No file system is mounted

Note that you should report this error only if the command is valid but no file system is mounted. If the command is not valid, you should just print the command error (described earlier).

If the disk exists and the residing file system is consistent, you can proceed to loading the superblock and set the current working directory to the root directory. Do not flush the buffer when mounting a file system.

    • void fs_create(char name[5], int size)

Creates a new file or directory in the current working directory with the given name and the given number of blocks, and stores the attributes in the first available inode. A size of zero means that the user is creating a directory. If no inode is available, you must print the following error to stderr:

4
Error: Superblock in disk <disk name> is full, cannot create <file name>

You must make sure the new file or directory have a unique name within the current working directory, and the name cannot be . or .. which is reserved. You can have multiple files with the same name only if they are located in different directories. If there already exists a file or directory with the same name under the current working directory, then print the following error to stderr:

Error: File or directory <file name> already exists

Recall that the file must be allocated a number of contiguous blocks, so your program must find the first set of contiguous blocks that can be allocated to the file by scanning data blocks from 1 to 127. Print the following error to stderr if there is not enough contiguous free blocks to allocate to the new file:

Error: Cannot allocate <file size> on <disk name>

You should check the availability of a free inode, then the uniqueness of the file/directory name, and finally the availability of enough contiguous free blocks to fit the new file’s data blocks.

    • void fs_delete(char name[5])

Deletes the file or directory with the given name in the current working directory. If the name represents a directory, your program should recursively delete all files and directories within this directory. For every file or directory that is deleted, you must zero out the corresponding inode and data block. Do not shift other inodes or file data blocks after deletion. If the specified file or directory is not found in the current working directory, print the following error to stderr:

Error: File or directory <file name> does not exist

    • void fs_read(char name[5], int block_num)

Opens the file with the given name and reads the block num-th block of the file into the buffer. If no such file exists or the given name corresponds to a directory under the current working directory, the following error must be printed to stderr:

Error: File <file name> does not exist

If the block num is not in the range of [0, size-1], where size is the number of blocks allocated to the file, print the following error to stderr:

Error: <file name> does not have block <block_num>

    • void fs_write(char name[5], int block_num)

Opens the file with the given name and writes the content of the buffer to the block num-th block of the file. If no such file exists or the given name corresponds to a directory under the current working directory, the following error must be printed to stderr:

Error: File <file name> does not exist

If the block num is not in the range of [0, size-1], where size is the number of blocks allocated to the file, print the following error to stderr:

Error: <file name> does not have block <block_num>

    • void fs_buff(uint8_t buff[1024])

Flushes the buffer by setting it to zero and writes the new bytes into the buffer. No errors must be handled in this function.

    • void fs_ls(void)

Lists all files and directories that exist in the current directory, including special directories . and .. which represent the current working directory and the parent directory of the current working directory, respectively. You must print the result to stdout. Always print . and .. as the first and second rows


5
of the result, respectively, and then all files and directories according to the indices of their corresponding inodes. For files, you must print the size of the file as well. For directories, you must show the number of files and directories that exist in this directory (do not do this recursively). If the current directory is the root directory of the virtual disk, then .. and . both represent the current directory.

Suppose the disk has the following structure:

root

file1 (3 KB)

 dir1

 dir11

file2 (6 KB)

file3 (2 KB)

file4 (19 KB)

file5 (7 KB)

file6 (4 KB)

file7 (38 KB)

 dir2


If the current working directory is root, then your program should output:

    • 6

.. 6 file1 3 KB

dir1 4 file7 38 KB

dir22

where each output row is formatted as follows if it is for a file:

printf("%-5s %3d KB\n", name, size);

and as follows if it is for a directory:

printf("%-5s %3d\n", name, num_of_children);

In the above example, dir1 contains 4 items because there are 3 directories, namely dir11, ., .., and 1 file which is file6. If the current working directory is dir1, then your program should output:

    • 4

.. 6 dir11 6 file6 4 KB

You do not need to handle any errors in this function.

    • void fs_resize(char name[5], int new_size)

Changes the size of the file with the given name to new size. If no such file exists in the current working directory or the name corresponds to a directory rather than a file, your program should print the following error message to stderr:

Error: File <file name> does not exist



6
If the new size is greater than the current size of the file, you need to allocated more blocks to this file. You must keep the start block fixed and add new data blocks to the end such that the file’s data blocks are still contiguous. If there are enough free blocks after the last block of this file, change the size in the inode to new size. Otherwise, you must try to move the file so that you can increase the file size to the new size, and copy all data in the previous blocks to the new blocks. The new starting block must be at the first available one that has enough space for the resized file. If there are not enough contiguous free blocks on the disk to fit the file with its new size, print the following error to stderr and do not update the file size:


Error: File <file name> cannot expand to size <new size>

If the new size is smaller than the current size, then delete and zero out blocks from the tail of the block sequence allocated to this file. The starting block is not moved when decreasing a files size. Finally, change the size attribute in the inode to the new size.


    • void fs_defrag(void)

Re-organizes the file blocks such that there is no free block between the used blocks, and between the superblock and the used blocks. To this end, starting with the file that has the smallest start block, the start block of every file must be moved over to the smallest numbered data block that is free.

For example, if the blocks are in this state before defragmentation:


inode[4]    inode[2]    inode[0]    inode[3]




they will be in the following state after this operation:

inode[4]    inode[2]    inode[0]    inode[3]




When moving blocks, the data must be copied over to the new blocks. Hence, files will contain the same data as before. You do not need to handle any errors in this function.

    • void fs_cd(char name[5])

Changes the current working directory to a directory with the specified name in the current working directory. This directory can be ., .., or any directory the user created on the disk. If the specified directory does not exist in the current working directory (i.e., the name does not exist or it points to a file rather than a directory), print the following error message to stderr:

Error: Directory <directory name> does not exist You can assume that the given name has no slash at the end.

Starter Code

Download a3-starter-code.zip from eClass; it contains the starter code for this assignment. The starter code includes a program to create and format a virtual disk, a header file, and a small number of input files with the expected stdout and stderr outputs and resulting disk images. You might modify the header file provided with the starter code.

To create a file that emulates the disk and format it, run the following command:

$ ./create_fs disk0


7
This command will create a 128 KB file in your current directory which acts as the disk for your file system. The name of this file is disk0 in the above example. This program also “formats” your disk by initializing the superblock and inodes.

Note that your file system must store data persistently on disk, making sure that when you request the data again, you get what you put there in the first place. Thus, if you terminate your program and then run it at a later time, all files in your file system must be intact.

Input file

Your program should read commands from an input file, run them one at a time, printing out the result of each command. The input file follows a strict format: each line contains a command and a number of arguments.

The name of this input file is passed to your program, fs, as a command line argument:

$ ./fs <input_file>

Deliverables

Submit your assignment as a single compressed archive file (fs-sim.zip or fs-sim.tar.gz) containing:

    1. A custom Makefile with at least these targets:

        (a) the main target fs which simply links all object files and creates an executable file called fs.

        (b) the target compile which compiles your code and produces the object file(s)

        (c) the target clean which removes the objects and executable files

        (d) the target compress which creates the compressed archive for submission, this should not compress and binary (executable or object) files

    2. A plain text or markdown document, called readme.md, which explains your design choices, lists the system calls you used to implement each function, and elaborates on how you tested your implementation. Make sure you cite all sources that contributed to your assignment in this file. You may use a markup language, such as Markdown, to format this plain text file.

    3. All files required to build your project including the starter code.

Misc. Notes

    • This assignment must be completed individually without consultation with anyone besides the instructor and TAs.

    • You can write the file system simulator program in C or C++. You must compile the program using gcc or g++. No warnings should be returned when your code is compiled with -Wall flags. Also your code must not print anything extra for debugging.

    • Check your code for memory leaks. You may use the Valgrind tool suite: valgrind --tool=memcheck --leak-check=yes ./fs

    • You can use your own machine to write the code, but you must make sure that it compiles and runs on the Linux lab machines (e.g., ugXX.cs.ualberta.ca where XX is a number between 00 and 34).

    • When developing and testing your program, make sure that you clean up all processes before you logout of a workstation. You can use the ps command to list the processes you created and the kill command to terminate them.

8

More products