Introduction to awk programming 2016
Welcome to the web page for the course "Introduction to
The lecture notes as well as a list of the material covered can be found at the end of the page. An abstract of the of the course can also be found further down.
The course will take place from the 15th to the 17th August 2016 at the Heidelberg University. We will meet in room 3.103 (PC-Pool 1), Mathematikon (INF 205) on the third floor. The course is structured as a full day course running from 9:30am till about 5pm each day (with a one hour lunch break in between).
Dealing with large numbers of plain text files is quite frequent when
making scientific calculations or simulations. For example, one wants to
read a part of a file, do some processing on it and send the result off
to another program for plotting. Often these tasks are very similar, but
at the same time highly specific to the particular application or
problem in mind, such that writing a single-use program in high-level
Java hardly ever makes much sense: The
development time is just too high. On the other end of the scale are
simple shell scripts. But with them sometimes even simple data
manipulation becomes extremely complicated or the script simply does not
scale up and takes forever to work on bigger data sets.
Data-driven languages like
awk sit on a middle ground here:
scripts are as easy to code as plain shell scripts, but are well-suited
for processing textual data in all kinds of ways. One should note,
awk is not extremely general. Following the UNIX
philosophy it can do only one thing, but this it can do right. To make
proper use of
awk one hence needs to consider it in the context of a
UNIX-like operating system.
In the first part of the course we will thus start with revising some
concepts, which are common to many UNIX programs and also prominent in
awk, like regular expressions. Afterwards we will discuss the basic
awk scripts and core
awk features like
- ways to define how data sits in the input file
- extracting and printing data
- control statements (
If there is time left we will also look at some advanced topics, like
with arbitrary precision using
This course is a subsidiary to the bash course which was offered in August 2015.
After the course you will be able to
- enumerate different ways to define the structure of an input file in
- parse an structured input file and access individual values for post-processing,
- use regular expressions to search for text in a file,
- find and extract a well-defined part of a large file without relying on the exact position of this part,
- use awk to perform simple checks on text (like checking for repeated words) in less than 5 lines of code.
- Familiarity with a UNIX-like operating system like GNU/Linux and the terminal is assumed.
- Basic knowledge of the UNIX command
grepis assumed. You should for example know how to use
grepto search for a word in a file.
- It is not assumed, but highly recommended, that participants have had previous experiences with programming or scripting in a UNIX-like operating system.
|Course files (including notes, resources and example files)|
|Solutions to the exercises (pdf with comments)|
|Solution script files|
Both the lecture notes as well as the script examples are managed
in a public git repository on github.
For the most recent version of the material (including corrected errors
and other updates) you should refer to this repository
or to the DOI
Feel free to cite this DOI in case you find the course material useful for your work.