START OF WEEK 6 of Linux

<<< FILE TIMESTAMP >>>
  -The up to date file is: /usr/courses/cps393/dwoit/courseNotes/
  -If you are viewing your own copy, check it is up to date
  -If you are viewing from a link in the Course Outline, be aware it may be outdated.


<<< ARGUMENT PROCESSING >>>
---------------------------

 commands often have options of form -x, with some options
 requiring additional information

     eg)  ls -F -t
     eg)  wc -l fname
     eg)  gcc -o xzy -c blah.c

#!/bin/bash
#Source: argp
#process arguments which can be -a, -b or -f  
#arg -f expects filename after it
#e.g., argp -b -f myfile -a

  while  [ "$1" ] ; do
    case  $1  in
         -a | -b)
              echo arg is -a or -b
              #process these options
              ;;
         -f)
              #process option -f filename
              if [ "`echo $2 | cut -c1`" = "-" -o -z "$2" ] ; then
                 echo "missing filename for -f"
              elif [ "$2" ] ; then
                # process -f filename
                echo arg is -f $2
                shift
              fi
              ;;
         *)
              echo  "$1  invalid argument"
              ;;
    esac
    shift
  done
  exit 0


<<< GETOPTS  >>>
----------------

  getopts  cmd  can parse  args of a shell pgm.  

  getopts reads arguments 1 by 1 & puts in c (you name it)

  ?  is assigned to  c if  arg  not in list you provide

#!/bin/bash
#Source: gops
#parses command line arguments using getops command
#getopts list arg
#list is a list of valid options, e.g., below they
#     are a,b,d or o
#the ":" after any char in list indicates that option
#    requires an argument.
#so valid usage of gops is:
#  gops -o dog
#  gops -o "dog mouse"
#  gops -d "abce" -b
#  gops -a -o dog -b
#  gops -ab -o dog -d mouse
#invalid are:
#  gops -o
#  gops -x
#in these cases, error messages are printed automatically and $c is
#set to ? so that it matches the last part (so both exit 1)

 while getopts abd:o: c  #when done returns false
          do
               case $c in
               a | b)    #do some processing here
                         echo "in case a|b, option is $c"
                         ;;
               d)       # $OPTARG is now whatever came
                        # after the -d
                         echo "in case d, option is $c arg is $OPTARG "
                         ;;
               o)       # $OPTARG is now whatever came
                        # after the -o
                         echo "in case o, option is $c arg is $OPTARG "
                         ;;
               \?)       #error msg is automatically written
                         #from the getopts command
                         exit 1;;   #STOPS gops and sets $? to 1
               esac
          done
  exit 0


<<< AWK  (A PROGRAMMING LANGUAGE) >>>


Creators: Aho, Weinberger, Kernighan.   AWK!!

 What does it do?
 Can search (similar to grep) and replace (similar to sed) but can do 
 both at once AND has functionality of a programming language, such as 
 doing math, logic operations, formatted printing, arrays, if-else, 
 loops, etc.
 
 Why not just use, say, Python, instead then? Because awk was designed
 specifically for the task of being kind of command-line app and kind
 of programming language. It`s possible to insert Python in the middle
 of a pipeline (using python -c), but this can get ugly. Awk was 
 developed specifically for this, and has many default values so that 
 it`s commands can be very terse (undefined variables default to zero, 
 automatic math on strings, etc.)


Basic Idea: 
      -each line of a file is called a "record"
      -each record is list of words called "fields"
      -awk reads file record by record
      -for each record
           -manipulate it
           -output resulting record
      -do special tasks before/after processing records

Usage:
    awk -f awkfile [file1 file2 ... filen]
     or
    awk commands [file1 file2 ... filen]


    awkfile contains awk program
    file1...filen concatenated to become input
    if no files, then input from stdin
    commands are awk commands (as would be in awkfile)


Form of awk program (awkfile above or in quotes on cmd line):
  BEGIN { action }  -executed before processing any records
  {action}          -executed each record
  pattern {action}  -executed for records matching pattern
  END {action }     -executed after all records processed

Flow of awk pgm:
  BEGIN actions are executed
  for each record
      awk considers all given actions, and
      executes ones that are appropriate
  END actions are executed

Patterns:
  either
   -regular expression inclosed in //
   -logical expression, involving boolean operators

Variables:
  $0 whole line (record)
  $1 first field in line (record)
  $2 second field in line, etc
  FS input field separator (defaults to ONE OR MORE blanks and tabs)
  (more variables are below)


e.g. p1.awk:
#Usage awk -f p1.awk [fnames]
#try using file d1 as in: awk -f p1.awk d1
#Program to print records without modifying them
#This is same as {print} since no argument defaults to $0
{print $0}

e.g. p2.awk:
#Usage awk -f p2.awk [fnames]
#try using file d1 as in: awk -f p2.awk d1
#Program to print only field 2 and 3 of each record
#Strings "F2" and "F3" are printed before appropriate field
{print "F2:",$2,"F3:",$3}
   
e.g. p3.awk:
#Usage awk -f p3.awk [fnames]
#e.g.  awk -f p3.awk p3.awk.data
#print only those records where $5 is = 0 and $6 < 8
$5==0 && $6 < 8 {print $0}

e.g. p4.awk:
#Usage awk -f p4.awk [fnames]
#e.g.  awk -f p4.awk p4.in2
#print record followed by sum of its first 3 fields
#for those records starting with a number (followed by any char)
#also print total of all first 3 fields for each printed record
#note: non number fields treated as zero. 
BEGIN {t=0}
/^[0-9]./ {ft=$1+$2+$3; t=t+ft; print $0, ft} 
END {print "Total:",t}

For input file p4.in2:
just a string here
1 2 3
this is a 99 string, too
4 2 1
99 101 200
xxx yyy 12zzz wwwww aaaa

Get output:
1 2 3 6
4 2 1 7
99 101 200 400
Total: 413


e.g. p5.awk:
#Usage awk -f p5.awk [fnames]
#print a histogram for number of "a" grades, "b" grades, etc
#assume record is: LName SID Mark
#only include records that have a sid in  field 2
BEGIN {print "Histogram of marks:"}
$2~/[0-9]+/ {   #~ is "contains", "+" is one-or-more of previous
                #(remember "*" is zero-or-more)
t=t+1
if ($3>=80) ++a
if ($3>=70&&$3<80) ++b
if ($3>=60&&$3<70) ++c
if ($3>=50&&$3<60) ++d
if ($3<50) ++f
}
END {
#need printf statements because print always puts a newline
#every time it is executed. So instead of AAAA would get each on
#a new line within the for loop
for (i=1;i<=a;i++) printf "%s", "A"
printf "\n"
for (i=1;i<=b;i++) printf "%s", "B"
printf "\n"
for (i=1;i<=c;i++) printf "%s", "C"
printf "\n"
for (i=1;i<=d;i++) printf "%s", "D"
printf "\n"
for (i=1;i<=f;i++) printf "%s", "F"
printf "\n"
print "Total records processed:", t
}


For input file p5.in:

This is a file of student marks
This file has 34 lines in total
and thus 31 marks
Dong 9876543 50
Chin 8866547 42
Abrams 2222542 55
Smith 2222542 58
Derk 2222542 59
Mock 2222542 71
Ellis 2222542 70
Adams 2222542 73
Bik 2222542 75
Allan 2222542 76
Grand 2222542 76
Malone 2222542 79
Catts 098sd 66
Bong 2222542 79
Xzing 2222542 79
Dor 2222542 80
Doggs 098sd 66
OConnor 2222542 81
Quinn 2222542 82
Chan 2222542 82
Wong 2222542 86
Oh 2222542 90
Li 9865542 69
Mo 9865522 79
Singh 2222542 91
Mak 2222542 92
Li 2222542 93
Lin 2222542 96
Lau 2222542 92
Wang 2222542 99
Ng 2222542 12

Get output:

Histogram of marks:
AAAAAAAAAAAA
BBBBBBBBBB
CCC
DDDD
FF
Total records processed: 31

e.g. p5a.awk:
#Usage awk -f p5a.awk [fnames]
#Program to print only lines containing one or more
#of pattern dog cat bird. Note {print} same as {print $0}
/dog|cat|bird/ {print}
#or this
#/dog/ || /cat/ || /bird/ {print}


#Usage awk -f p5b.awk [fnames]
#Program to print lines containing pattern dog
#and print "bird line" for lines containing bird
/dog/ {print}
/bird/ {print "bird line"}


e.g. p5c.awk:
#Usage awk -f p5c.awk [fnames]
#Program to print only lines containing all
#patterns dog cat bird. 
/dog/ && /cat/ && /bird/ {print}


#Usage awk -f p5d.awk [fnames]
#Program to print only lines containing both
#patterns cat and bird and has $1 < 4.
/cat/ && /bird/ && $1 < 4 {print}


FIELD SEPARATOR defaults to one or more spaces and/or tabs

e.g. p6.awk:
#Usage awk -f p6.awk [fnames]
#Program to print only field 2 and 3 of each record
#Note awk's Field Separator defaults to one or more
# spaces and/or tabs
#try this for stdin (note 2nd line has some tabs):
#1         2         3           4
#1       2              3                4
{print $2,$3}

This changes field separator to SINGLE comma:
#Usage awk -f p6comma.awk [fnames]
#Program to print only field 2 and 3 of each record
#Field separator is changed to a single comma
BEGIN {FS=","}
{print $2,$3}

For input file p6comma.in:
aaa bbbb, cccc,dddd, e f g ,ABC,,DEF
a b c ,d,e,,f,g,,h,i

Get output:
 cccc dddd
d e

Field Separator can be any string, even a regex!
FS="\t+"          #one or more tabs 
FS="[[:space:]+]" #moot, since this is the default
FS="a[0-9]*bc"    #strings like abc, a8bc, a482bc, etc.


Other Variables defined by awk:

  RS  input records separator (default is newline)
  NF  number of fields in current record
  FNR number of records read so far from current input file 
  NR  number of records read so far from all input files
  FILENAME name of current input file

e.g. print first input line: 
  #Usage awk -f p7line1.awk [fnames]
  #prints only line 1 of input and then stops processing
  NR==1{print;exit}


Can use own variables in awk:

e.g. print a specific line of input using own variable line: 
  #Usage awk -f p7line3.awk [fnames]
  BEGIN {line=3;}
  NR==line{print;exit}


Comparison Operations Include:
  == != < > <= >= 

String Operations:
  string~pattern true if string contains a match for regex pattern
  e.g., $2~/^abc/ true if $2 starts with "abc"
        $2!~/^abc/ true if $2 does not start with "abc"
  length(s) number of chars in s
  
  if use string in arithmetic expression, it is automatically 
  converted to a number. If it does not make sense, converted to 0
  e.g.,
 
   #Usage awk -f p8.awk [fnames]
   #turning strings into numbers
   BEGIN { x=(4 + "26") + "hello"
           y=("4" "26") - "25"
           print "x:",x,"y:",y
   }
   > awk -f p8.awk 
   x: 30 y: 401
   >

Printf:
  works like the C printf. 
  e.g.:
          x=0
          printf("ab %d cd %3.2f %d\n",5,2.1,x)
  prints: 
          ab 5 cd 2.10 0


Can also use awk without an awkfile, by putting actions on
the command line (protect with quotes).

e.g., 

If try1 is:
r1f1 r1f2 r1f3 r1f4 0 9
r2f1 r2f2 r2f3 r2f4 1 20
r3f1 r3f2 r3f3 r3f4 0 -5

then:
/home/dwoit> awk '$5==0 && $6 < 8 {print $0}' try1

gives output:
r3f1 r3f2 r3f3 r3f4 0 -5


e.g.,
 
awk 'BEGIN{x=0;printf("ab %d cd %3.2f %d\n",5,2.1,x)}' 

gives output
ab 5 cd 2.10 0

Can use awk in Shell Programs:

 -But wait... what is $2 then?
 -does $2 refer to shell program`s second CLA, or awk`s second field?
 -it depends how you quote it, since shell evaluates "" and not '' (so
  '$2' gets passed to awk (awk`s $2) whereas "$2" gets substituted
  before getting passed to awk.

e.g.,
#!/bin/bash
#Source p20.sh
#prints lines of file named out for which both of these are true
# -line contains the string that is the shell program`s $1
# -second field of the line (awk`s $2) is less than X where X
#  is the shell program`s $2
var1=$1
var2=$2
#Either line below works OK. One uses variables var[12] and other uses $[12] 
#cat out | awk "/$var1/"' && $2 < '"$var2"'  {print $0 }'
cat out | awk "/$1/"' && $2 < '"$2"'  {print $0 }'

> cat p20.infile
SAME 2
MEOW 5
WOOF 3
SAME 3
SAME 3
MEOW 5
MEOW 3
> p20.sh MEOW 4
MEOW 3
>


Lots of other abilities:
  command-line arguments (like C)
  arrays
  if-else
  while-loops, for-loops

The man-page for awk is a good reference


HMWK: use awk to process a file where most records have the format:
LastName FirstName SID a1 a2 t1 t2
where SID is 9-digit student-id-number; a1 and a2 are marks for assignments
1 and 2, each out of 15%; t1 and t2 are marks for tests 1 and 2, 
each out of 35%. 
Note that normal regex works in patterns, so to match $3 being 9 digits 
you can use: $3 ~ /^[0-9]{9}$/
and to match one or more  digits: [0-9]+
Any records in the file not matching the above format are simply
printed. Records that do match are also printed; however they are
followed by the student`s final mark in the course (in percent.)
Once all students have been processed, their course average
should be printed.
Modify the above to print an "average" line containing averages
for a1, a2, t1, t2 and final mark.
Modify the above to use an if-statement appropriately. Here is an example
of one:
  if ($2=="F"){
    print $1 "Fahrenheit"
  } else {
    print "there is no F in second field"
  }
HMWK: what is a shell pipeline that uses history, awk, sort, uniq, head
to list the 10 commands you use most? Useful options for sort are -r -n
and for uniq -c.

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Remaining is NOT IN FALL 2023
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

<<< FILE DESCRIPTORS >>>
-----------------

-indexes into  "filedesc.table"  (maintained by  0/S)
-  1 table  per process
-  up to  20 entries called file pointers (file descriptors)
     indexes into  FD table

-  FD table:  for each file opened:
     -  how opened  r/w/a
     -  current position in file
     -  ptr to file contents

-  when file opened, entry put in  FD  table

-  FD 0  stdin
   FD 1  stdout
   FD 2  stderr
   FD 3-20  can be associated with files by user

-  open FDs (including stdin) are inherited by child process

before redirection:             after:  cmd <file1
FD  table:
            0    12345          0  1070   
            1    9724           1  9724  
            2    3127           2  3127 

     - file1 opened returning  FPtr 1070
     - stdin closed
     - 1070 copied into FD Table entry 0 (stdin)

cat  file >  file2       -  close stdout (FD1)
                         -  open file  file2  (generate  ptr)
                         -  put  ptr in slot  1  of  FDT.


to read file  line by line and only print non-blank lines

#!/bin/bash
#Source: mycat
#Usage:  mycat 3<somefile

while read input            #input="`line < somefile`"
do                          #reads line 1 over and over
     if  [ "$input" ]       #could  do:
     then                   # cat somefile | while read line
          echo  "$input"
     fi
done <&3
exit 0


in shell:
          mycat    3<somefile


#!/bin/bash
#Source: to46
#Usage:  to46 4<infile 6>outfile
# shell pgm reads one line of file assd. with  FD4 and
# writes line twice to stdout & once to file assd.  FD6
first=`head -1 <&4`
echo $first  >&1
echo $first  >&6
echo $first
exit 0


Can read 2 or more files at once:

input files:
    addrs:          phn:       
          add1           ph1  
          add2           ph2 
          add3              

output file:
     out:      
          Name: n1
          Address: add1
          Phone: ph1
          Name: n2
          Address: add2
          Phone: Ph2
          Name: n3
          Address: add3
          Phone:

#!/bin/bash
#Source adr
#reads from FD 3 & 4 and outputs to FD 5
#Usage:  adr  3<addrs  4<phn  5>out
#note could do read add <&3 instead of add=`line<&3`
#NOTE remove comments from this if pasting
while
     echo  -n "Enter name:   "
     read name    #^d stops it
do
     read add<&3`        #stdin redirected so once read a line
     read ph<&4`         #its gone, therefore 2nd time thru, 
                         #line 2 is read
     cat >&5   <<EOF     #here document
          Name:    $name  
          address: $add
          phone:   $ph
EOF
done
exit 0


in shell:    adr 3<addr 4<phn 5>out      

Enter name:    n1
Enter name:    n2        
Enter name:    n3
Enter name:    n4   <--what does this to to the out file?


<<< EXEC >>>

Open files within a shell pgm:    exec stmt

#!/bin/bash
#Source: addr1
#Usage:  addr1               or
#        addr1 addrs         or
#        addr1 addrs phn     or
#        addr1 addrs phn out  
#if any args not specified, defaults are used
exec 3<${1:-addrs}
exec 4<${2:-phn}      # if $1 has value, OK 
exec 5>${3:-out}      # if $1 has no value, set it to addrs 
while

     read add<&3`
     read ph<&4`
do
     echo      "  Name:  $name"    >&5
     echo " Address:  $add"   >&5  
     echo "  Phone:  $ph"     >&5
done
exit 0


END OF WEEK 6 (UNIX)

May do u6Lab now