$24
Figure 3: Simplified Duplicate Inode Block Layout
The command will be run with the following format:
$ ./ext2sutils dup FS_IMAGE SOURCE DEST
FS_IMAGE will be a valid ext2s filesystem image.
SOURCE always refers to regular files and has two possible formats:
1. inode_no: Just a valid inode number. With this format, SOURCE is the inode number of the file to be duplicated.
2. /abs/path/to/filename: An absolute path to the file starting from the root directory /. To simplify your implementation, the separator is guaranteed to be a single slash, there will never be multiple consecutive slashes.
8
DEST also has two possible formats:
1. dir_inode/target_name: The left side of the / is a directory inode, and the right side is the name of the destination duplicated file.
2. /abs/path/to/target_name: An absolute path, same properties as with SOURCE. The last compo-nent of the path is always the name of destination file and never just a directory. Intermediate directories in the path will always exist.
Some examples for extra clarification:
• dup file at inode 15 into directory inode 2 (root) with name file.txt $ ./ext2sutils dup fs.img 15 2/file.txt
• dup file hw3.c under /folder/ into directory inode 14 with name code.c $ ./ext2sutils dup myext2.img /folder/hw3.c 14/code.c
• dup file at inode 88 into directory /home/torag/docs/ with name recovered.bin $ ./ext2sutils dup example.img 88 /home/torag/docs/recovered.bin
• dup file hr.avi under directory /y/2020 into /y/2021 with name nostalgia.avi $ ./ext2sutils dup example.img /y/2020/hr.avi /y/2021/nostalgia.avi
To duplicate the file you need to:
1. In case there are absolute paths, determine the inode of the source file and target directory via path traversal. Otherwise, they’re already given.
2. Allocate an inode for the new file.
3. Duplicate the metadata in the previous inode into the new one. Some fields have to be set differently though!
4. Increment reference counts for the file blocks.
5. Insert an entry with the new inode and target name into the target directory. Most of the time you should be able to find space for your entry inside one of the existing allocated blocks. Sometimes however there won’t be enough space for your entry; in which case you have to allocate a new data block and add it to the directory inode. You are expected to handle this case.
Your program should print the allocated inode number as well as the data block allocated for the directory if one has been allocated (-1 otherwise).
Example output where inode 17 was allocated and the directory entry fit inside an existing block:
17
-1
Another one where inode 126 was allocated and a new data block 1257 was allocated for the entry:
126
1257
9
The output alone is of course not enough, you should make sure to update the necessary filesystem structures correctly and that your filesystem remains consistent. The new file should be visible under the target directory, share blocks with the original file and you should not allocate any extra unnecessary blocks or corrupt other files. As a sharing test, you can try mounting the filesystem as ext2 and modifying one of the two files. Changes in one should be reflected in the other if the blocks have been shared.
Resulting images are not expected to match bitwise. Obviously, things like timestamps will be different. And there may be multiple valid positions for inserting an entry into a directory without allocating a new block; all of which will be accepted.
Important: Many implementations use smart heuristics when allocating blocks and inodes. You shall not. When allocating a new block or inode, simply go through each block group and their bitmaps in order and allocate the first free one you find. You must use this approach for your outputs to match.
4.2 rm - removing files (40 pts)
The next utility you will write is rm, which is pretty much the rm you’re used to, except it’s on ext2s. The command will be run with the following format:
$ ./ext2sutils rm FS_IMAGE DEST
FS_IMAGE and DEST are the same as with dup. DEST can be either a dir_inode/target_name pair or an /absolute/path/to/target and will again always be a regular file.
To remove a file entry from a directory, you need to do the following:
1. Find the inode of the target directory through traversal if given an absolute path. Otherwise it’s already given.
2. Find the target entry in the directory data blocks. This will give you the inode of the target file.
3. You also have to remove the directory entry you’ve found. This is simple linked list removal: the previous entry will have to be made to point to the next one. For the first entry case, simply set the inode field to 0 to indicate that the entry should be skipped. If the entry was the only one on the data block, you should not deallocate the data block.
4. Time to move on to the inode. First of all, you should decrement the hard link count of the inode. If there are still hard links left, we can stop here. Whew! If there are no hard links left, we’ll have to delete the inode. Uh oh...
5. Deallocate the inode from the bitmap.
6. Then, go through the blocks of the file and decrease their reference counts. If a block’s reference count becomes zero after decrementing, also deallocate that block from the bitmap.
The program should once again output two lines. The first line will contain the inode of the target file. The next line will contain the block numbers of all deallocated blocks (those who’ve reached a reference count of zero and have been deallocated from bitmaps, not all blocks). If no blocks were deallocated, print -1. The order of the blocks does not matter; the line will be sorted during grading.
Some hypothetical runs for clarification:
10
• remove file useless.bin under directory inode 17
• the target inode turns out to be 25 and no blocks are deallocated $ ./ext2sutils rm fs.img 17/useless.bin
25 -1
• remove file keys.txt under /secret/very_hidden/.sn34ky
• the target inode is 257 and seven blocks get deallocated
• ./ext2sutils rm myfs.img /secret/very_hidden/.sn34ky/keys.txt 257
251 327 328 326 325 87 441
Similar considerations as before apply: Just the output is not enough! The filesystem should remain con-sistent, the entry should be removed from the directory, no extra blocks should be allocated or deallocated and other files should not be touched.
• Bonus Opportunities
5.1 Indirect Block Support (20 pts)
For the base part of the homework, supporting files (including directories) only having direct blocks is enough. If your program can also handle files and directories containing indirect blocks (up to triple indirect), you will get bonus points.
Please note that indirect blocks are not different from data blocks in terms of being shared; they are also subject to being reference counted and will be duplicated rather than copied when using dup.
Under some circumstances, you may have to allocate multiple blocks when you need a new data block for a directory entry when performing dup. As an example, consider the case where all direct and single indirect blocks are full and double indirect has not been allocated yet. In this case, you will have to allocate two indirect blocks along with the data block for a total of 3 and you should print all three block numbers in the second line of the output. Their order does not matter, and which block gets assigned to which level does not matter either as long as the structure is correct.
Remember that files can have holes in ext2, which can raise some questions about how to allocate new data blocks for directories especially when indirect blocks are involved. Do not worry about that and assume that your directories will not have file holes.
5.2 share - sharing file blocks between separate files (20 pts)
The best part of having a file system in which we can share blocks is... Sharing blocks of course! Wouldn’t it be great to be able to reduce disk usage by combining equal blocks of different files? Time to do it!
$ ./ext2sutils share FS_IMAGE FILES...
The FILES... argument represents one or more files, all of which will be given as absolute paths. The goal is going through all the data and indirect blocks in all the files, and sharing those who contain the same data. When multiple blocks containing the same data are found, the one with the smallest block number shall be kept and referenced by the others. All the other blocks found containing the same data shall have their reference counts decremented and possibly be deallocated. Since this is a command that’s useful for large files, your implementation should support indirect blocks for this bonus.
11
Your output will contain one line for each shared block group. The first number in the line will contain the smallest block (that all the others will reference), and the rest of the line will contain pairs of the form block_no:ref_count; the block number of each block containing the same data and how many times they have been referenced. The order of the lines and the order of the pairs in each line does not matter. If no blocks containing shared data are discovered, the output will be empty.
As an example, consider the following run and scenario:
$ ./ext2sutils share e2.img /A /B /C /D
Let’s say you find that block 3 in A, block 13 in B and block 12 in D contain the same data, all of which appear once. Then, you find that block 48 appearing four times in B and block 9 appearing once in C contain the same data. Finally, you also find that block 24 appearing three times in D and block 32 appearing once in D contain the same data. The following would be a valid output:
3 3:1 12:1 13:1
9 9:1 48:4
24 24:3 32:1
A non-sorted order is also valid:
9 9:1 48:4
24 32:1 24:3
3 13:1 3:1 12:1
In the end, references to block 12 and 13 will be replaced by references to block 3. The reference count of 3 will increase by two; 12 and 13’s reference count will decrease by one and they will be deallocated if the counter reaches zero. References to block 48 will be replaced by references to block 9: block 9 will have four more references and 48 four less references. Something similar happens for block 24 and 32.
Please note that all blocks are considered together as a pool of blocks, we do not care about which block belongs to which file at all. Indirect blocks are also considered for sharing, just like data blocks. Once again, the filesystem has to remain consistent after the operation, and blocks should actually become shared, with removed blocks having their reference counts decreased and possibly being deallocated. Unrelated parts of the filesystem should not be corrupted during the operation.
As we will be working with large files, time complexity is important. A naive O(n2) double loop over the blocks is not going to cut it for large amounts of blocks! Use of proper data structures to achieve an O(n) or O(n log n) complexity is critical. Example images and runs with time limits will be provided later.
• Specifications
◦ Your code must be written in C or C++.
◦ Your implementation will be compiled and evaluated on the ineks, so you should make sure that your code works on them.
◦ Implementing only the inode_no and dir_inode/target_name formats will net you half points from dup and rm. You will get full points if you also implement absolute paths.
◦ You are free to use any standard libraries in your code.
12
• You are supposed to read the filesystem data structures into memory, modify them and write them back to the image. Mounting the filesystem in your code is forbidden; do not run any other executables from your code using things like system() or exec*(). This includes robin which is provided to help you test.
• The ext2fs.h header file is provided for your convenience. You are free to include it, modify it, remove it or do whatever you want with it.
• We have a zero tolerance policy against cheating. All the code you submit must be your own work. Sharing code with your friends, using code from the internet or previous years’ homeworks are all considered plagiarism and strictly forbidden.
• Follow the course page on ODTUClass and COW for possible updates and clarifications.
• Please ask your questions on COW instead of sending an email for questions that do not contain code or solutions, so that all may benefit.
• Tips and Tricks
◦ The given steps for implementing dup and rm are meant to guide you, but do not contain every single detail. As an example, the free block counts in the super block and block group have to be updated when one is allocated or deallocated. You have to find out and take care of these details. e2fsck should help with this.
◦ Loading the filesystem image into memory is an excellent candidate for using mmap.
◦ There are many operations involving traversing the blocks of an inode. Trying to come up with a configurable generic method to traverse them would be helpful, especially if you want to support indirect blocks.
◦ The data structures contain many unsigned 32-bit and 16-bit fields. Be very careful with overflows and negative values. Such bugs are hard to find and fix.
◦ Always check the consistency of the filesystem after modifying images to catch problems early. Backup your images.
◦ Do not just ignore all "multiply-claimed blocks" warnings from e2fsck; some may be legitimate programming errors on your part.
• Submission
Submission will be done via ODTUClass. Create a gzipped tarball file named hw3.tar.gz that contains all your source code files together with your Makefile. Your archive file should not contain any subfolders. Your code should compile and your executable should run with the following command sequence:
• tar -xf hw3.tar.gz
• make all
• ./ext2sutils
If there is a mistake in any of the 3 steps mentioned above, you will lose 10 points.
Late Submission: Remember that you had two free late days given to you at the beginning of the semester which you can use if you haven’t yet. Otherwise, a penalty of 5 (late days)2 will be applied to your final grade.
13
• Appendix - More ext2s and robin Details
9.1 Consistency Checking
There are essentially three types of blocks that are valid in ext2s:
• File blocks: These blocks are the ones shared between files. They are referenced at least once and should be referenced as many times as they appear in file inodes.
• Special blocks: These are blocks like the superblock, block bitmap, inode table etc. They are not referenced from any inode since they are not part of a file, but are still allocated. Their reference counts should be one.
• Free blocks: These blocks are not referenced from any file nor allocated. They are free blocks, ready to be allocated. Their reference counts should be zero.
robin check will check all the blocks of the filesystem and report any block not matching one of the above three cases as problematic. Then it’s up to you to figure out the bug.
9.2 The Special .block_refmap File
When an ext2 filesystem is converted into ext2s via robin convert, an extra inode is allocated and the refmap blocks are indexed as data blocks for this new file, which is added to the root directory as
".block_refmap".
This prevents e2fsck from complaining about having found allocated blocks that do not belong to any files. Also, being able to see the refmaps directly through a file is useful for debugging; you can mount the filesystem and then check or modify the refmaps through this file. All the refmaps appears contigously in the file, even though they are in different block groups on the filesystem. Note that the refmap in the last block group can have many unused entries, just like bitmaps. The refmaps will always take 32 blocks per block group.
Since your code will not be mounting the filesystem, it should modify the refmaps directly by getting their positions from the block group descriptors.
14