Starting from:

$30

Phase 2 Solution

Setup​ cgroup controls to be passed into the program using command line arguments.






The file sr​_container.c has an array ‘cgroups’​ which is supposed to hold all the cgroup controls for the newly created container. This array holds structs of type “​cgroups_control​”. You would ideally have one entry of this struct inside the ‘cgroups’​




array per cgroup control (memory, cpu, cpuset, blkio). Within this struct you have a double pointer ​‘settings’ ​that points to a collection of struct type ​‘cgroup_setting​’. This struct holds settings specific to a cgroup-controller.




Ex:​cgroup-controller: memory




cgroup-settings: memory.limit_in_bytes,




memory.kmem.limit_in_bytes




tasks




So you must fill in the ​cgroups array with ​‘cgroups_control’ ​elements. And each of these elements will have a list of relevant settings as shown above. The given code has an example for the ​‘blkio’ ​controller. Note that all controls will have the ​‘tasks’ setting to ensure the process is added to the tasks list of that cgroup.




You must update the main()​ of the given program to support more flags. These flags will enable the user (of the program) to set cgroups when running the code. You must accordingly fill in the above array with the right values. You can have a look at how the arguments are handled in as of now in main() and extend it to fetch more flags and update the array. The​ flags to be supported are given as comments in the template code. Note​ that the 4th flag was changed from blkio-weight to memory.







In addition to the cgroup controls, an addition flag also is to be supported to provide the program with a ​hostname​. The value of this flag must be set to the ​‘hostname’ attribute of ​‘child_config’​struct created at the beginning of main().







2. Implement​ the child process creation logic







Fill in the left off portion of the code in main()​ [in​ sr​_container.c​]to successfully create a child process with namespace isolation for the following namespace: ​Network,

Cgroup, PID, IPC, Mount, UTS (Don’t​ add User namespace)​.Lines​ 171 – 186.

3. Changing​ root using pivot_root()







Complete the method switch​_child_root() in the sr​_container_helpers.c file using the pivot_root() system call. Refer here for info on the arguments to use with pivot_root(): http://man7.org/linux/man-pages/man2/pivot_root.2.html




4. Setting​ capabilities to the container







For the purpose of performing permission checks, traditional UNIX implementations




distinguish two categories of processes: privileged​ processes (whose effective user ID is 0, referred to as superuser or root), and ​unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials (usually: effective UID, effective GID, and supplementary group list).




In recent kernel versions, Linux divides the privileges traditionally associated with superuser into distinct units, known as ​capabilities​, which can be independently enabled and disabled. Thus with this new feature the kernel can control privileges allowed to a traditional super-user process.




Capabilities basically subdivide the the property of being “root”. We can restrict certain access of some processes even though they have root privileges. For example we may allow a process to set network devices (​CAP_NET_ADMIN​) but disallow reading all files (​CAP_DAC_OVERRIDE​). However, not all of the properties of being a root is subdivided into capabilities. There are some properties that is still accessible after dropping capabilities.




Read ​here​for more info: ​http://man7.org/linux/man-pages/man7/capabilities.7.html (You can complete the assignment even with the description on this handout)







In this assignment we want some of these harmful/unnecessary capabilities also to be disabled from our SRContainer. The list of capabilities that must be disabled are:




CAP_AUDIT_CONTROL, CAP_AUDIT_READ, CAP_AUDIT_WRITE,




CAP_BLOCK_SUSPEND, CAP_DAC_READ_SEARCH, CAP_FSETID, CAP_IPC_LOCK,




CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_MKNOD, CAP_SETFCAP, CAP_SYSLOG, CAP_SYS_ADMIN, CAP_SYS_BOOT, CAP_SYS_MODULE, CAP_SYS_NICE, CAP_SYS_RAWIO, CAP_SYS_RESOURCE, CAP_SYS_TIME, CAP_WAKE_ALARM

Disabling capabilities involves 2 steps:




Dropping the said capability from the ambient​ capability set​of the process.




Clearing the said capability from the inheritable​ capability set​of the process.




You can read more about the different capability sets of a process in the man page.




However the description that follows must be sufficient to complete the assignment.




Ex:​Say you want to disable the capabilities: CAP​_MKNOD​and CAP​_SYS_BOOT




Use ​prctl()​to drop the capabilities from the AMBIENT set Use ​cap_get_proc()​to get the capability sets of the process




Use ​cap_set_flag()​to clear the capabilities from the INHERITABLE set Use ​cap_set_proc()​to set the cleared set back to the process




Use the approach shown above to complete the setup​_child_capabilities()​method in sr_container_helpers.c​.




You can test if this works by simply running “mknod​ <SOME_NAME p”​. If the capabilities have been set properly then this should fail.







To test if the capabilities were set properly you can do the following:




Copy the binary​'capsh'​found inside the [/sbin]​​folder of the docker container into



the [/sbin]​​folder of the 'rootfs'​​you downloaded to run containers. cp /sbin/capsh $ROOTFS/sbin/




Now if you run 'capsh​ --print'​[inside our SNR_CONTAINER]​without this method



implemented (i​.e: capabilities not being filtered)​the output for [Bounding​ set] will​ indicate many capabilities.




But after properly implementing this method (filtering​ capabilities)​if you run the same command inside your SNR_CONTAINER container you will see a smaller set
of capabilities for [Bounding​ set]

5. Disabling​ system calls inside a container







In addition to disabling capabilities, we also want to restrict​ processes inside our




SRContainer from using certain system​-calls that​ can possibly lead to a vulnerable state. ​Seccomp is one kernel feature which can be used to achieve this. This feature allows to control which system-calls a process and all its children have access to. It also enables to set the action to take ​(kill the process, raise a signal, just allow it, etc) when a process tries to execute such a system call. The intent is to allow untrusted processes to use the resources provided by the kernel with restricted access without abusing them .




In this assignment we will use this ​seccomp kernel feature to limit the system-calls allowed to the processes within our SRContainer. Support for the seccomp feature is provided by the ​libseccomp​library.




The idea behind instigating this system-call restriction is as follows:




Create a system calls filtering​ context​with a default behavior for all system-calls



Set up filters​​on this context for certain system calls that must be handled differently



Set any attributes that applies to created seccomp context.



Load the newly configured context into the kernel.



Release any memory allocated for the seccomp context that was just configured. This does not affect the context that was loaded into the kernel.



See the detailed description below of each of these steps.




(Trust me you can just use this as a one-to-one template to finish this part of the assignment)




STEP-1:







STEP-2:




In the last example (2nd image) we want to capture calls to unshare() only if the ‘CLONE_NEWUSER’ flag is used.




So in the call to ​seccomp_rule_add(), in its 4th argument we say we want “one” argument match on the call to unshare.




We include what this match is in the 5th argument of seccomp​_rule_add().




SCMP_A0​(​SCMP_CMP_MASKED_EQ​, CLONE_NEWUSER, CLONE_NEWUSER)

SCMP_A​0​-




Tells to match the 0th argument of unshare(). If it was SCMP​_A​1 then​ the match must be on 1st argument. Notice that its 0 indexed like arrays in C




SCMP_CMP_MASKED_EQ -




Tells that it’s not a one to one match but its a check on a MASKED argument. This is because the argument to unshare() can be an OR of many flags: CLONE_FS | CLONE_FILES | CLONE_VM | CLONE_NEWUSER.




3rd Argument: The mask for validation




4th Argument: What it must be equal to




So similarly you can write rules to match certain arguments on the system_call when filtering them.




STEP-3:














































Set the filter attribute value of ​SCMP_FLTATR_CTL_NNP.​

STEP-4:














































Load the created context into the kernel. You can simply re-use Step-3 and 4 as is.




Your task (should​ you choose to accept it)​is to:




Complete the method ​setup_syscall_filters()in​ the sr​_container_helpers.c file to STOP our SRContainer from invoking the following system calls. Any process that attempts to run these system calls must be killed (SCMP​_ACT_KILL SCMP_FAIL)​. All other system calls must be allowed.




ptrace



mbind



migrate_pages



move_pages



unshare (​Only restrict if the CLONE_NEWUSER flag is used​)



clone (​Only restrict if the CLONE_NEWUSER flag is used)



chmod (​Only restrict if the ​S_ISUID or ​S_ISGID flags​ are used for the “mode” argument)



You can test if this works by simply writing a C program which tries to use one of the system calls above.




Instructions to copy your code into the host-container environment.




You must first copy the template code folder “A3Template” to the cs310 server.




scp -r <path​_in_your_pc​/A3Template <socs​_uname​@cs310.cs.mcgill.ca:~







You can compile the program by simply running ‘make​ container’ with​ the given




Makefile​or use the complete ‘gcc’​​command:




gcc​-o SNR_CONTAINER​-​g​-​Wall -Werror​sr​_container.c sr_container_helpers.c sr_container_utils.c -​lseccomp -lcap







Then, you must copy the built executable into your own docker-container environment.




That is, the container you created with ‘docker​ run’ for​ Phase-1.




docker cp ~/A3Template/​SNR_CONTAINER​<container_name:/home







Now, if you go into your container using:




docker exec -it <container_name /bin/bash




You should see your executable and you can run it with the correct flags.




Do not copy your entire A3Template​​code into your docker container. Only copy the built executable.










What to submit:




sr_container.c ​(with your changes)




sr_container_helpers.c​(with your changes)




No need to submit any other files since you will not have to change them.







Rubric (This Phase accounts for 40%)




1.
Setting up cgroups/hostname flags:
7%
2.
Implementing child process logic:
7%
3.
Proper usage of​pivot_root()​:
6%
4.
Implementing capabilities:
10%
5.
Implementing syscall filtering:
10%

More products