You have been using libraraies in this course but we haven’t discussed them in detail or discussed more advanced uses of this programming and development feature; that is, how to build you own C library. Why would you want to do this? And, how would you do this. In a general sense libraries are collections of precompiled functions that have been written to be reused by other programmers. In your case, imagine, three different programmers were designing and coding up TinySearch. Once they completed the overall design then they might come to the conclusion that there are a number of common functions that could be reused by each component being coded up: for example all the code in the lab6/src/util directory. These common functions in util could be built as a library that each programmer could link as a static library (lib.a) or better built as a shared object (lib.so) that is dyanmically shared. We will discuss these various static and dynamic options and as part of Lab6 you will create a static libray from the functions in the util directory and link to that library at compile time. This will require you to build a library and the change your makefile to build the crawler, indexer and query engine so that these objects use that library. Understanding and being able to build a library is another one of those skills you need to get your cs50 hackers badge.
We plan to learn the following from today’s lecture:
Libaries consist of a set of related functions to perform a common task; for example, the standard C library, ‘libc.a’, is automatically linked into your programs by the “gcc” compiler and can be found at /usr/lib/libc.a. Standard system libraries are usually found in /lib and /usr/lib/ directories. Check out those directories. By default the gcc compiler or more specifically its linker needs to be directed to which libraries to search other than the standard C library - which is included by default.
There are a number of conventions for naming libraries and telling the compiler where to find them that we will discuss in this lecture. A libray filename always starts with lib. The last part of the name determines what type of library it is:
.a: static, traditional libraries. Applications link to these libraries of object code.
.so: dynamically linked shared object libraries. These libraries can either be linked in at runtime but
statically aware or loaded during execution by the dynamic link loader.
The way to view a static library is that it is linked by the linker and included in the execution code. So if 10 applications linked in a static library it would mean that each application’s resulting binary would include the referenced library in its program binary. This leads to large executable files. To address this people use shared libraries. These libraries contain the same references to those found in static ones but the code for those functions are not directly included in the resulting executable. Rather, shared libraries access a single copy of the libray that is shared by all the 10 applications while all excuting at the same time. There is some operating system magic to make this happen safely but it is a foundation on modern computing.
As in the case of stdio.h the library has to provide a header file that is included in the source code that defines prototype functions and variables that the library prpovides. Many times there are functions not included in the standard C library that you need to get access to. For example, take the example code below: it uses functions provided in the the math library (i.e., libm.a) and defined in math.h which is included as a header file in the code below.
The trig.c code comes from another nice book on Linux Programming by Masters and Blum
trig.c
The trig program relies on the math library libm.a so it is necessary to tell the linker explicity to include that library. Note, that on Linux machines that the math library is included as part of the GLIBC (libc.a) package so you don’t really need to do this on Linux. But the point is that you can’t rely on that for many C functions that are provided by external functions. Therefore, it’s you should tell gcc which library to include as below:
The -lm option tells gcc to search the system provided math library (libm). It will look in /lib and /usr/lib to find the library as discussed above. the “m” is the name of the library libm.a The “-l” option is the library name: -llibrary. gcc searches for the library named library when linking, which is actually a file named liblibrary.a.
The following is a tutorial on building libraries with focus on the Tiny Search Engine. This tutorial is largely adapted from http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html).
We will not cover building shared dynamic libraries in the lecture below but if interested check out the tutorial that provides a nice set of simple examples.
Your crawler code is probably factored into two components, namely crawler and util, for which the
former contains the main control logic of the crawler and the latter is more of a general purpose toolbox
that you utilize in your crawler and in other components of your Tiny Search Engine. So, would it be
cool if you can make your util into a C library and just link to it when you need to use
functions from it, instead of having to copy over all the util C files as you work on different
components of your search engine? This short tutorial is going to show you just how to do
that.
We will be using the crawler sample solution code as our example to walkthrough how to build its util
into a static C library and how to link to it to create the crawler binary executable.
The crawler code is divided into two directories: crawler and util, we want to build the util into a C library.
Navigate into util directory and compile all the .c source files into .o object files by supplying the -c flag:
$ gcc -Wall -c *.c
You should see a number of .o object files created corresponding to the .c source files.
Next, we issue:
which takes all the .o object files generated from previous step and packages them into a single .a static
library file, named lib[xxxxx].a. In our case, since we are making the util library for the tiny search
engine (tse), we named it libtseutil.a. Notice, that the name must start with lib and
end with the .a extension. (ar is the Linux archiver utility, man it to learn about it and its
options.)
The ar command does the heavy lifting and – creates and maintains library archives – take a look at the
man pages to determine the detail meaning of the flags; in brief:
-c Whenever an archive is created, an informational message to that effect is written to standard error.
If the -c option is specified, ar creates the archive silently.
-v Provide verbose output.
-q Quickly append the specified files to the archive. If the archive does not exist a new archive file is
created.
We have successfully built our own C library file and we are ready to link it to build our crawler binary. But before that, we can also take an optional step to view what files our library contains.
You can do
$ ar -t libtseutil.a
to see what files this library includes. For example, following is the output, indicating the four .o files included in this library file.
Now we navigate to the crawler directory and build our executable binary. There are two ways of doing this; we can do
$ gcc -o crawler crawler.c list.c ../util/libtseutil.a
in which we directly specify the path to the the library file; or we can do
$ gcc -o crawler crawler.c list.c -L../util/ -ltseutil
in which we specify the directory path containing the library file with the -L flag and specify that we want
to link to our TinySearchEngine util library with the switch -ltseutil. Pay close attention to the spelling
of this switch and the name we gave to our library file: this switch is basically cooked up by just cutting
off the lib prefix and .a extension from the library name and sticking a -l before what’s
left.
In lab6 we want you to create a static library as above and link it into the three main components if needed. The following steps serve to outline a general approach to do this:
where UTILLIB might point to the name of the static library, UTIL[HC] the header and source files in your util directory. You don’t have to worry about cd back into your component directory from util directory because every single line in Makefile runs in a separate process.
An example snippet of the Makefile for crawler could be as follows:
This is beyond a hint. You work it out.
NOTE: Once you have made a library you can not debug it even if the debug flags are set in the Makefile. For example, the symbols are not available for the functions in tseutil – so you can inspect data or set break points, etc. The best thing is to make sure your util code is debugged before creating libtseutil.a. Don’t create the library while you have bugs.
Note to lecturer: run example in l23/util. Then go to lab6/src/util and lab6/src/crawler and run and discuss the Makefile.