Stick this in /usr/man/man5/ if you like it.
programming - a point of introduction to fundamentals of programming
PAGE DATE 19980122
This file is intended to provide a beginning point for learning computer
programming. Examples assume a unix environment. Explanations are
intended to be practical, as opposed to theoretically or historically
precise. Basic end-user skills are assumed. Topics introduced include
Boolean logic, flow control, conditionals, and binary numbers. These
concepts are fundamental to programming, and they are not as complicated
as they sound. The summary includes a brief survey of programming tools
for unix. This file is crawling with personal opinion, some of which is
quite non-standard. Don't flame me, write a better version.
The programming language used for examples is dcon, which can be
obtained as C sourcecode ( heh, as if you knew what the heck THAT was :o)
at e.g. ftp://sunsite.unc.edu/pub/Linux/system/serial/dcon0.97.tgz
and installs easily in Linux.
To pick a practical starting point for a discussion of programming,
consider a simple electronic calculator. This will give us a reference
point for explaining what a computer really is. This will also allow us to
introduce some of the things you should know about "2 + 2" when you're
programming that you don't need to know about when you're calculating. A
lot of terms will be introduced or used in a programming sense in the next
few paragraphs. If you find this boring, hang in there. Getting a computer
to do what YOU want it to is very satisfying. Also, knowing the basics of
programming makes many things clearer from the "end-user" point of view.
A calculator can perform the type of data processing we call arithmatic.
Using the buttons on the calculator you can input a number or two and
instruct it to perform a calculation on the numbers you have input, and it
will display a result. You can then instruct the calculator to perform
some other calculation. Even some of the very simplest calculators will
allow you to use the answer from a calculation in the next calculation,
since the number in the display is also stored in memory. In this way the
previous calculation is preserved in the current state of the calculator.
On a Hewlett-Packard 10b business calculator, the button sequence... 41 /
7 = will produce a display of 5.857143 .( Actually, the HP handheld has a
dash-and-2-dots divide symbol.) The same calculation in the xcalc
calculator emulator for X Windows gives an answer of 5.8571429 . Not only
do these two highly regarded tools disagree, but neither is perfectly
correct. Oh dear! Since 41 and 7 are integers, there is a very simple way
to give a precise and correct answer to the calculation 41 / 7 . In
integer arithmatic 41 / 7 equals "5 remainder 6". The remainder is also
known as the modulo. The integer representation is strictly correct and
precise, but very inconvenient, since the answer is two distinct numbers.
Also, "5 remainder 6" may not really express what we want to know about 41
/ 7. A rational number, such as 5.857143 is a better answer to
"Approximately how big is a one-seventh slice of 41?", whereas integers
better answer the question "How many whole sevens are there in 41?".
Our two example calculators are geared for the rational numbers. When you
input a 12 to the HP or to xcalc they store it in an internal
representation for rational numbers something like +12.00000. On the other
hand, the unix bc calculator utility prefers integers. If you type
bc<return> and then tell bc 41 / 7<return> bc will answer with 5 . The
modulo can be obtained separately with the modulo operator, e.g. 41 % 7 .
The internal representation of rational numbers is slightly different in
the HP calculator and in xcalc. bc's integers are very different, and
integer operations are actually very different from operations on
rationals. There is an internal commonality between all these calculating
tools, however. Basic electronic digital circuits don't have zero through
nine. The basic circuits don't have decimal points or exponents. Digital
circuitry works on ones and zeros. Any data a computer can hold, from
numbers to strings of text charachters to pictures to phone calls to
whatever, must at some point be converted to ones and zeros. A single
piece of digital data, which may be either a zero or a one, is called a
bit. If a bit is being used to represent part of a number then it is one
All our example numbers were calculated using no other digits besides one
and zero. Humans have 20 digits, in two sets of ten. Everyday numbers are
base 10, also known as decimal. Ten was chosen instead of 20 because of
the difficulty of counting on your toes. With just one and zero we can
have a base 2 number system, known as binary. The conversion from the
decimal numbers you input, to the binary form it was calculated in, and
back to the form your answer was displayed as, was hidden from you. In the
context of a calculator that's the way you want it, you want to think
about the kind of numbers you're working with. Unlike a calculator, a
computer might be asked to manipulate any kind of data.
Numbers of any size can be represented by bits in binary, just as numbers
of any size can be represented by 0 through 9 in decimal. A group of 8
bits is called a byte. There are 256 possible settings of 8 bits, from
00000000 to 11111111.
The points I hope I've made by now are; all digital computation is performed
on ones and zeros, any data can be represented by ones and zeros, arithmatic
is just one of many types of data processing, and even the simplest arithmatic
can be performed a variety of ways.
A computer can perform any kind of data processing; it is not limited to
arithmatic. Computing vaguely resembles a subset of thinking. However,
thinking is rooted in the motives, emotions and experiences of the
individual, things computers do not have. Computers can do well at
strictly confined tasks, such as playing chess, but the most powerful
computers currently in existence can only emulate the behavior of a
microscopic quantity of biological braincells.
There are several things a computer needs to be able to do that a
calculator doesn't. A computer must be able to perform operations on
individual bits. It must be able to perform certain operations
conditionally, i.e. it must have some form of "IF". In other words, it
must be able to "decide" whether or not to do something based on some
condition. Also a computer must be able to do things repetitively. These
features of computers; general data manipulations, conditionals, and
repetition; are the difference between calculations and computations, or
formulas and algorithms. The distinction between computers and calculators
dates from the days when both "calculators" and "computers" were typically
human beings, perhaps aided with a sliderule or mechanical adding machine.
It's not a matter of power or speed, it's a matter of versatility. There
are many types of data manipulations that simply can't be done without
conditional execution of steps, for instance.
I ran abruptly into this difference myself once with a "programmable
calculator" I used to have. It could store and sequentially execute a
program of 500 or so instructions, instructions in this case being
calculator button strokes. It therefor had the repetition capability of a
computer, but it didn't have conditional execution; It had no "if". The
task I was unable to get it to do for me without conditionals was to
convert an azimuth to a bearing. An azimuth is an angle between 0 and 360,
where zero degrees is north. A bearing is an angle between 0 and
90 in a quadrant. The azimuth 30 is the bearing N30E, i.e. "north 30
east" in the northeast quadrant. As far as I can tell, that conversion
requires a "decision" about what quadrant we find our azimuth in that
requires a conditional, and so a calculator simply can't do it.
Here's a reminder of some terms I've snuck into the discussion so far that
were used in a computing sense....
data input display memory
state integer modulo bit
binary byte conditionals emulator
algorithm internal representation instruction
dcon is a very simple programming language for unix. It's the simplest,
easiest to learn programming language I've seen that still is fairly
useful. It allows you to edit instructions to the computer into a text
file, and then you can execute the file, thus running the program you have
written. dcon has just enough functionality that we can write a simple
program that will demonstrate conditionals, repetition, and non-arithmetic
data operations. dcon also has simple user input and output facilities,
which are not strictly a requirement for something to be a computer, but
are a practical necessity, particularly for a tutorial. Please install
dcon if you haven't yet.
Put the following two lines in a text file as the first two lines....
print "Hello World! \n"
Then chmod the file to executable and run it. If dcon is in /usr/sbin
you should get a cheery response from your new executable. Don't go any
further until you do. Move dcon or change the first line to find it if you
have to. Don't ever get any more complicated than you have to when
programming. Otherwise you will write unfixable code, which is very
OK, it says hello? Every time? Great. You've written a unix command.
It's a useless unix command, and a very common useless unix command, but
you now have a toehold on programming. Your program runs. Now change the
text string so you can say you've written something besides "Hello World".
The first line, #!/usr/sbin/dcon, tells unix to run the given interpreter,
dcon, on the rest of the file. dcon is the type of programming language
known as an interpreter. So are Perl, Lisp, BASIC, and the various unix
shells, among others.
The second line is an instruction to print a string of charachters to the
user's terminal. Charachters are given to the dcon print instruction
within double quotes. The \n is the unix standard print command escape
for "print a newline", or carriage return and linefeed. Charachter
terminal IO (input/output) still uses the terminology of paper teletypes.
The second line is very similar to the C or Perl instructions for the
same action. You can remove the \n to see the difference without it.
Now add a few more print lines to your program, like....
print "Hello World!\n"
print "I'm so proud"
print " to be here!\n\n\n\n"
print "Yeeeee HAW!\n"
...and run it. You can see several things from this. We've gotten clever
with \n. We use ! too much, like the yahoos we are. And most important,
our program is executed line by line from top to bottom. This is the flow
of control of the program. Computers do things one at a time, not all at
once, and top to bottom is the default order. The crucial repetition and
conditional features of a computer are based on changing this order. We
change the order of execution with flow control instructions.
dcon has a tiny set of flow control instructions. It has labels, goto,
gosub and return. We'll use a label and a goto to write an idiotic endless
loop, which will demonstrate only the repetition capability of
programming. Change you proggy to this.....
print "habba hooba ya! \n "
print "Ain't THIS a party!?!?! \n "
Make sure there's a \n in the print statements inside the loop, i.e.
between :chorus and goto chorus.
When you run it now you'll have to do ^c to kill it. Do so.
Ah, the power of repetition. The little cybercretin you just created
will chant "habba hooba ya" forever if you let it.
Perhaps the use of a conditional would cause something a bit more
interesting to happen. Let's also input something, for the conditional to
print "Enter some text\n"
if len($b)>100 goto theend
print "OK, 100 charachter limit reached. I quit.\n"
This little program has one goto making a loop to :top, and one goto that
breaks us out of this loop if we've accumulated more than 100 charachters
in our string.
The program decides how many times we have to loop back to :top to get a
string of 100 or more charachters. In fancier languages this would be done
with a do--until construct. dcon can be commented with #, rem or // .
Here's the same thing with comments.
:top # this is a label, a branch target
print "Enter some text\n" # tell the user what to input
input $a # get user input as string var. $a
let $b=$b+$a # + is "append" with strings. $b grows
print $b"\n" # print our current accumulated string
if len($b)>100 goto theend # break out if our string is >100 bytes
goto top # unconditional branch, making our loop
:theend # branch target for escaping the loop
print $b"\n" # print $b the last time
print "OK, 100 charachter limit reached. I quit.\n"
OK, we've got repetition and conditionals going on. Now let's see what
kind of bit-twiddling we can do in dcon. This one just churns some
:top # top of endless loop
let a=a+1 # increment a
print a" " # print current increment value
print a&16776961"\n" # pr increment and increment ANDed with 16776961
goto top # loop
The & operator is the dcon bitwise Boolean Logic AND operation. You may
not have known you wanted to learn this. You do. It's a very fancy name
but what it does couldn't be and simpler. Computer logic is Boolean logic,
named after a guy named Boole who invented it. AND is one of a few crucial
Boolean functions, along with NOT, OR and XOR.
Let's say we have two bits named a and b that we input to AND. AND will
give an answer of 1 if and only if a and b are both 1. If either or both
of a and b are 0, then a AND b = 0. With 2 input bits there are only 4
possible cases of AND, so we'll show them all in this little table...
a b aANDb
0 0 0
0 1 0
1 0 0
1 1 1
When we use dcon's & operator on two numbers the two numbers are ANDed
on a bitwise basis, i.e. each significance-aligned pair of bits in the two
arguments is ANDed and the answer is thus the bitwise AND of the two
inputs. The number 16776961 in our example is
111111111111111100000001 in binary. That's our bitmask. When we & our
bitmask with some other number, the 1 bits in the other argument will
become 0's in the result at each bit position where the mask is zero. This
makes our example program do a funny sort of counting. Check it out. This
illustrates that dcon gives us bit-resolution control of integers.
You want to be familiar with the few boolean functions, because they're
simple, and because they're what computers do well. They therefor pop up
over and over again. OR is 1 if a OR b is 1....
a b aORb
0 0 0
0 1 1
1 0 1
1 1 1
XOR is the exclusive-OR function, which is best described by the table.
The dcon XOR operator is ^ .
a b aXORb
0 0 0
0 1 1
1 0 1
1 1 0
NOT is a unary function; it takes one argument. dcon doesn't seen to have
it as a bitwise operator, but it bears mention anyway.
A function like addition or AND that takes two arguments is called a
binary function. This is not the same as a binary number, so I stalled on
introducing that use of the term.
Everything computers do is made out of these few simple operators.
dcon is an extremely simple programming language. It was developed to
write modem scripts. The author of dcon tends to refer to it as a utility.
All programming languages are instruction sets for some virtual computer.
dcon was kept simple by implementing a very typical virtual machine, i.e.
it's similar to a generic microprocessor. It's kindof like interpreted
assembly language. dcon's flow control instructions are very similar to
the native machine language of a microprocessor. It's very generic, and so
is good for introducing fundamentals.
I hope you now are familiar with the programming connotations of these
programming language interpreter
flow control endless loop
Boolean Logic argument
bitmask unary function
The terms I should cover also in an intro........
subroutine hexadecimal octal
The number 16776961 is 111111111111111100000001 in binary. Binary doesn't
convert very sensibly to base 10, as you can see. Often data is
represented in base 8 or 16, octal and hexadecimal, because these number
bases map to consistent chunks of bits in a binary number. dcon doesn't do
hexadecimal or binary input or display of integers, so I used dc to figure
what decimal number my desired bitmap is. 16776961 is FFFF01 in
hexadecimal. The hexadecimal digits are 0 1 2 3 4 5 6 7 8 9 a b c d e f .
It takes exactly 4 bits to represent 16 things. That's why FFFF01 is a
better representation of our bitmap than the decimal version. Each hex
digit maps to four adjacent bits, and in a simple case like this you can
perhaps do the conversion in your head.
Octal is base 8, and the octal digits 0 through 7 always map to aligned
sets of 3 bits, which is why the chmod command can accept octal digits for
Subroutines are modularized sections of a program. Subroutines are created
in dcon with the : gosub and return instructions. Subroutines
save resources if a single routine gets called from several places in a
program. Again, the dcon mechanism ( for subroutines) doesn't add much to
what microprocessors do to support subroutines. The thing that makes a
subroutine modular is the return instruction. The return instruction sends
program flow back to where the subroutine was called from, i.e. after a
return the next instruction is the one just after the calling gosub.( "The
stack" you may have heard about elsewhere in unix is the data structure on
the cpu chip for keeping track of which gosub we'll be returning to.)
Hopefully looking at some code will help clarify. The power of subroutines
depends on thier use in other code, so this example is a bit lengthy....
let i=i+1 # i is an increment counter
if i/3*3=i gosub by3 # is i a multiple of 3?
if i=30 exit 0 # 30 repetitions should do
goto top # loop
exit 1 # if flow hits this something is broke
:subrou # prints on every iteration
print a" Call me?\n"
print "if there's more text here each call saves more typing\n"
:by3 # this will print if a is a multiple of 3
print " AND me?\n"
The goto instruction gets a bad rap. There is the idea rampant in the
computer world that producing manageable, debuggable programs is largely a
matter of avoiding using goto. Most other programming languages provide
other flow control constructs such as for-next, do-while et. al. These
other constructs are actually made out of simple branches, the machine
equivalent of gotos.
The real key to keeping things from getting out of hand is to keep things
in comprehendable-sized modules. Subroutines are a big help with this.
Computers allow you to bust problems into manageable chunks. If at any
time you have to keep track of more than two or three branch targets (
goto labels) you need to simplify. Subroutines can be debugged ( tested
for correctness) individually, particularly if they don't mess with data
in ways that are too general.
Other means of programming....
The actual instructions a computer chip accepts and understands are far
removed from the ones in a dcon script. Scripts are charachters in a text
file. A CPU chip gets it's actual instructions from RAM, and they must be
in the exact form the CPU understands. Actual machine code is an
unintelligible stream of binary integers. The dcon script is interpreted
by the dcon executable every time it is run, and thereby converted to
actual instructions to the hardware. This is convenient, but an
interpreted program runs CPU-inefficiently, since the the interpreter
spends the great bulk of it's CPU time interpreting the given
instructions, rather than executing what they result in.
An assembler is a tool for programming a CPU in it's native instruction
set. An assembly language is a translation of the native instructions of a
CPU into a just slightly less cryptic form, e.g. JSR for the CPU's integer
code for "jump to subroutine", the machine equivalent of gosub. There's
usually a one-to-one mapping of machine instructions to assembly language
mnemonics for them.
An assembler takes a text file of assembly language instuctions and
assembles it into an executable binary suitable for direct execution by
the CPU. In unix the linker comes into play at this point. A binary
executable can be archived in a library to be used by other programs, or
made into a stand-alone command.
With mnemonics like JSR for machine instructions, and other rudimentry
conveniences like named variables and being able to input numbers in
hexadecimal instead of as ones and zeros, the process of writing assembly
language is still tedious. Also, assembly language is CPU-specific.
Different CPUs have different instructions, and thus different assembly
Compilers are like assemblers, but with greater translation
capability. Compilers do not restrict themselves to the native
instructions of the CPU. FORTRAN, COBOL, C, C++, ADA, and Pascal are
compilers. A compiler takes a text file and makes an executable file, but
through a more complex process than assembly.
A particular programming
language, such as C, is an implementation of a particular virtual
computer. The virtual computer implemented by C has a variety of flow
control constructs, a variety of data manipulation features, and syntactic
and other features that can be built from machine instructions, but are
not native to a particular CPU. A single instruction to the C virtual
machine may result in hundreds of machine instructions when compiled.
Languages like C are called high-level languages compared to assembler for
this reason. Having a high-level language produce hundreds of instructions
for you based on simple directives from you is nice. More important,
however, is the fact that the abstraction of the compiler allows code to
be written in C and compiled on a variety of CPUs, sometimes unchanged.
For example, The dcon interpreter is a single C sourcecode file. The
author of dcon wrote dcon in 1989 on some unix, and was able to compile it
in Linux in 1996 with no major modifications. If he had written it in
assembler that would not have been possible. unix is written mostly in C
for this reason, and unix's portability is key to it's success. This is
also why C is often called a systems programming language. The C virtual
machine is easily implemented on most CPUs, so C can be compiled to code
that performs well enough for the core of a system.
There are other species of programming language as well. Some implement
virtual machines totally unlike a typical CPU, but very few completely
hide you from the concepts shown above with dcon. For example, macro
languages like m4 implement repetition via recursion, which is a whole
nother ball of wax. My personal favorite programming language is Forth,
which is a stack-based compiling interpreter, but is not as widely useful
in unix as C is, since there's all that existing C code, including the
I didn't use bash for the examples because bash is geared for handling
files, and thus not the best for talking about the nuts and bolts of
dcon.doc and other dcon distribution files
The usenet comp.lang. hierarchy
the comp.unix.programming FAQ
the man page for your language of curiosity.
Sections 2 and 3 of the manpages are all C.
Rick Hohensee http://cqi.com/~humbubba ri...@capaccess.com