The goal of this tutorial is to show how to efficiently program in the C Programming Language as both required
courses, COMP 206 and COMP 310, use the language without proper introduction to tooling. It is suggested, but
not required, to learn a proper console-based text editor (e.g. vim
or emacs
) so you
can both code and use the tools in the same terminal.
Probably one of the most crucial aspect of efficiently coding (in any language), but also the one I do not want to elaborate about: adopting a consistent coding style. Inconsistent code is hard to read and write, which invariably leads to a higher bugs count, which translate to more time wasted on debugging. No respectable company will tolerate a bad coding style, so better start now.
If you do not know what style you should adopt, I suggest a coding style that survived 30 years in a codebase of more than 10 millions lines of code. The Linux Kernel Coding Style should scale well for all your C projects.
The 2 main compilers available on CS systems are gcc
and clang
which conveniently have
almost the same interface. This mean the option that will be discussed can be applied to both compilers. It is
good practice to make sure your code compiles fine on more than one compiler as different compilers might
generate complementary warnings.
There is a number of options you should always pass to your compiler to get the most out of it:
-pedantic
will ensure the code you write is standard C, meaning no usage of non-portable
compiler extensions, meaning your code should compile fine on any C compiler.-Wall
will emit most warnings the compiler supports.-Wextra
will emit extra warnings not covered by -Wall
.-g
generate complete debugging information.With the final command looking something like this:
$ gcc -g -pedantic -Wall -Wextra *.c
You should then fix your code until there are no errors or warnings left. It might be tempting to ignore some warnings, but they are most likely a symptom of bad coding practices at the very least.
You should note it is considered good practice to code in a top-down fashion; starting by coding the bare minimum needed to get a successful compile and then filling the previously defined empty functions as you implement the specific functionalities of your program.
This have the added advantage of not having a tremendous number of errors and warnings once everything is done and ensure you fix any obvious design flaws early on.
Both the C and C++ Programming languages have a certain peculiarity that makes them very different than a language like Java for example: a program might compile successfully while being invalid.
More specifically, there are syntactically valid programs for which the compiler will happily, or with warning(s), generate an incorrect binary. We say of such a program that it contains undefined behaviour.
A lot of students have difficulty grasping this concept, but it is impossible (or at least extremely inefficient) to try to reason logically about the flow of a program which contains undefined behaviour.
Take the following incorrect code as an example:
#include <stdio.h> int main(void) { printf("This might or might not print anything..."); int *a = 0x0; *a = 0; return 0; }
In the example above, the program is invalid because I try to write to a memory segment that was not properly allocated. Even though the print statement comes before the invalid memory access, at least on my machine, the print statement is omitted and the program immediately yield a segmentation fault.
If there is one thing to take out of this example it is that ensuring the program is properly defined comes before logically debugging your code. Another takeaway is that debugging a C program with print statements is highly inefficient and error-prone.
valgrind
is the most useful tool when programming in C. By default it acts as a memory checker and
will warn against pretty much every invalid use of memory and then some.
For the purpose of this tutorial, we will say that your C program is expected to be valid (at least with high
probability) if you compiled the program with the compiler options described above, fixed all warnings and
errors returned by both gcc
and clang
, and then fixed all errors and warnings returned
by valgrind
's memory checker.
Let us run valgrind
on the code snippet above:
$ clang -g -pedantic -Wall -Wextra *.c # Note the absence of any warning... $ valgrind ./a.out ==29785== Memcheck, a memory error detector ==29785== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==29785== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==29785== Command: ./a.out ==29785== ==29785== Invalid write of size 4 ==29785== at 0x40053E: main (main.c:9) ==29785== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==29785== ==29785== ==29785== Process terminating with default action of signal 11 (SIGSEGV) ==29785== Access not within mapped region at address 0x0 ==29785== at 0x40053E: main (main.c:9) ==29785== If you believe this happened as a result of a stack ==29785== overflow in your program's main thread (unlikely but ==29785== possible), you can try to increase the size of the ==29785== main thread stack using the --main-stacksize= flag. ==29785== The main thread stack size used in this run was 8388608. This might or might not print anything...==29785== ==29785== HEAP SUMMARY: ==29785== in use at exit: 0 bytes in 0 blocks ==29785== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated ==29785== ==29785== All heap blocks were freed -- no leaks are possible ==29785== ==29785== For counts of detected and suppressed errors, rerun with: -v ==29785== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Segmentation fault
A lot of noise, but the important part is this:
==29785== Invalid write of size 4 ==29785== at 0x40053E: main (main.c:9) ==29785== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Telling exactly what is wrong: an invalid write. Where it happened: file "main.c" at line "9". And why it is wrong: the address "0x0" was not allocated for this program. At the end there is a summary of the heap usage which will let you know if your program has a memory leak (doesn't free some memory allocated) and the total number of errors encountered.
Here the print statement did happen, but that is because valgrind runs the code in a way to make it more
deterministic for its analysis. Running your code in valgrind
might also uncover portability bugs
by behaving differently from running the program as-is.
#include <stdio.h> #include <stdlib.h> int main(void) { printf("This is guaranteed to print something!"); int *a = malloc(sizeof(int)); *a = 0; free(a); return 0; }
Fixing the program and running valgrind
again confirms it is now a valid C program.
$ clang -g -pedantic -Wall -Wextra *.c # Note the absence of any warning... $ valgrind ./a.out ==3457== Memcheck, a memory error detector ==3457== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==3457== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==3457== Command: ./a.out ==3457== This is guaranteed to print something!==3457== ==3457== HEAP SUMMARY: ==3457== in use at exit: 0 bytes in 0 blocks ==3457== total heap usage: 2 allocs, 2 frees, 1,028 bytes allocated ==3457== ==3457== All heap blocks were freed -- no leaks are possible ==3457== ==3457== For counts of detected and suppressed errors, rerun with: -v ==3457== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
You can remove the free(a);
from the code and see how it affects the output of
valgrind
as an exercice.
Once your program is well coded, generates no warnings or errors from both the compilers and the memory checkers, but the behaviour is still unexpected: then you most likely have a logical error in your code.
If that happens, you will want to use a debugger. gdb
is the most common C debugger in the UNIX
world and I will try to show you how to use it as concisely as possible. First, lets fire it up:
$ clang -O0 -ggdb -pedantic -Wall -Wextra *.c $ gdb -tui ./a.out
First, note that:
-g
option to -ggdb
to generate debugging symbols optimized for usage
in gdb
.-O0
option to the compiler to disable all optimizations. This ensure the code flow
will be as expected and that no code will be optimized out by the compiler.-tui
option to gdb
so that it would show the code we are working on
(assuming it was compiled with either -g
or -ggdb
preferably).Now here is a summary of the most useful commands:
Command | Abbreviation | Description |
run | r | Run the program. Arguments can be specified. |
break | b | Will pause the program execution at the line or function name specified. |
watch | wa | Will pause the program execution and notify you when the given symbol's value change. |
continue | c | Assuming the program execution is paused, it will resume it. |
next | n | When paused, this command execute the next line of code. |
step | n | Same as next but will "enter" functions. |
until | u | Like next but will run loops to completion. |
p | Display the current value of the given symbol. | |
backtrace | bt | Will yield a trace of all the function calls that led to that point in the program. |
quit | q | Exit GDB. |