START OF WEEK 6 of Linux <<< FILE TIMESTAMP >>> -The up to date file is: /usr/courses/cps393/dwoit/courseNotes/ -If you are viewing your own copy, check it is up to date -If you are viewing from a link in the Course Outline, be aware it may be outdated. <<< ARGUMENT PROCESSING >>> --------------------------- commands often have options of form -x, with some options requiring additional information eg) ls -F -t eg) wc -l fname eg) gcc -o xzy -c blah.c #!/bin/bash #Source: argp #process arguments which can be -a, -b or -f #arg -f expects filename after it #e.g., argp -b -f myfile -a while [ "$1" ] ; do case $1 in -a | -b) echo arg is -a or -b #process these options ;; -f) #process option -f filename if [ "`echo $2 | cut -c1`" = "-" -o -z "$2" ] ; then echo "missing filename for -f" elif [ "$2" ] ; then # process -f filename echo arg is -f $2 shift fi ;; *) echo "$1 invalid argument" ;; esac shift done exit 0 <<< GETOPTS >>> ---------------- getopts cmd can parse args of a shell pgm. getopts reads arguments 1 by 1 & puts in c (you name it) ? is assigned to c if arg not in list you provide #!/bin/bash #Source: gops #parses command line arguments using getops command #getopts list arg #list is a list of valid options, e.g., below they # are a,b,d or o #the ":" after any char in list indicates that option # requires an argument. #so valid usage of gops is: # gops -o dog # gops -o "dog mouse" # gops -d "abce" -b # gops -a -o dog -b # gops -ab -o dog -d mouse #invalid are: # gops -o # gops -x #in these cases, error messages are printed automatically and $c is #set to ? so that it matches the last part (so both exit 1) while getopts abd:o: c #when done returns false do case $c in a | b) #do some processing here echo "in case a|b, option is $c" ;; d) # $OPTARG is now whatever came # after the -d echo "in case d, option is $c arg is $OPTARG " ;; o) # $OPTARG is now whatever came # after the -o echo "in case o, option is $c arg is $OPTARG " ;; \?) #error msg is automatically written #from the getopts command exit 1;; #STOPS gops and sets $? to 1 esac done exit 0 <<< AWK (A PROGRAMMING LANGUAGE) >>> Creators: Aho, Weinberger, Kernighan. AWK!! What does it do? Can search (similar to grep) and replace (similar to sed) but can do both at once AND has functionality of a programming language, such as doing math, logic operations, formatted printing, arrays, if-else, loops, etc. Why not just use, say, Python, instead then? Because awk was designed specifically for the task of being kind of command-line app and kind of programming language. It`s possible to insert Python in the middle of a pipeline (using python -c), but this can get ugly. Awk was developed specifically for this, and has many default values so that it`s commands can be very terse (undefined variables default to zero, automatic math on strings, etc.) Basic Idea: -each line of a file is called a "record" -each record is list of words called "fields" -awk reads file record by record -for each record -manipulate it -output resulting record -do special tasks before/after processing records Usage: awk -f awkfile [file1 file2 ... filen] or awk commands [file1 file2 ... filen] awkfile contains awk program file1...filen concatenated to become input if no files, then input from stdin commands are awk commands (as would be in awkfile) Form of awk program (awkfile above or in quotes on cmd line): BEGIN { action } -executed before processing any records {action} -executed each record pattern {action} -executed for records matching pattern END {action } -executed after all records processed Flow of awk pgm: BEGIN actions are executed for each record awk considers all given actions, and executes ones that are appropriate END actions are executed Patterns: either -regular expression inclosed in // -logical expression, involving boolean operators Variables: $0 whole line (record) $1 first field in line (record) $2 second field in line, etc FS input field separator (defaults to ONE OR MORE blanks and tabs) (more variables are below) e.g. p1.awk: #Usage awk -f p1.awk [fnames] #try using file d1 as in: awk -f p1.awk d1 #Program to print records without modifying them #This is same as {print} since no argument defaults to $0 {print $0} e.g. p2.awk: #Usage awk -f p2.awk [fnames] #try using file d1 as in: awk -f p2.awk d1 #Program to print only field 2 and 3 of each record #Strings "F2" and "F3" are printed before appropriate field {print "F2:",$2,"F3:",$3} e.g. p3.awk: #Usage awk -f p3.awk [fnames] #e.g. awk -f p3.awk p3.awk.data #print only those records where $5 is = 0 and $6 < 8 $5==0 && $6 < 8 {print $0} e.g. p4.awk: #Usage awk -f p4.awk [fnames] #e.g. awk -f p4.awk p4.in2 #print record followed by sum of its first 3 fields #for those records starting with a number (followed by any char) #also print total of all first 3 fields for each printed record #note: non number fields treated as zero. BEGIN {t=0} /^[0-9]./ {ft=$1+$2+$3; t=t+ft; print $0, ft} END {print "Total:",t} For input file p4.in2: just a string here 1 2 3 this is a 99 string, too 4 2 1 99 101 200 xxx yyy 12zzz wwwww aaaa Get output: 1 2 3 6 4 2 1 7 99 101 200 400 Total: 413 e.g. p5.awk: #Usage awk -f p5.awk [fnames] #print a histogram for number of "a" grades, "b" grades, etc #assume record is: LName SID Mark #only include records that have a sid in field 2 BEGIN {print "Histogram of marks:"} $2~/[0-9]+/ { #~ is "contains", "+" is one-or-more of previous #(remember "*" is zero-or-more) t=t+1 if ($3>=80) ++a if ($3>=70&&$3<80) ++b if ($3>=60&&$3<70) ++c if ($3>=50&&$3<60) ++d if ($3<50) ++f } END { #need printf statements because print always puts a newline #every time it is executed. So instead of AAAA would get each on #a new line within the for loop for (i=1;i<=a;i++) printf "%s", "A" printf "\n" for (i=1;i<=b;i++) printf "%s", "B" printf "\n" for (i=1;i<=c;i++) printf "%s", "C" printf "\n" for (i=1;i<=d;i++) printf "%s", "D" printf "\n" for (i=1;i<=f;i++) printf "%s", "F" printf "\n" print "Total records processed:", t } For input file p5.in: This is a file of student marks This file has 34 lines in total and thus 31 marks Dong 9876543 50 Chin 8866547 42 Abrams 2222542 55 Smith 2222542 58 Derk 2222542 59 Mock 2222542 71 Ellis 2222542 70 Adams 2222542 73 Bik 2222542 75 Allan 2222542 76 Grand 2222542 76 Malone 2222542 79 Catts 098sd 66 Bong 2222542 79 Xzing 2222542 79 Dor 2222542 80 Doggs 098sd 66 OConnor 2222542 81 Quinn 2222542 82 Chan 2222542 82 Wong 2222542 86 Oh 2222542 90 Li 9865542 69 Mo 9865522 79 Singh 2222542 91 Mak 2222542 92 Li 2222542 93 Lin 2222542 96 Lau 2222542 92 Wang 2222542 99 Ng 2222542 12 Get output: Histogram of marks: AAAAAAAAAAAA BBBBBBBBBB CCC DDDD FF Total records processed: 31 e.g. p5a.awk: #Usage awk -f p5a.awk [fnames] #Program to print only lines containing one or more #of pattern dog cat bird. Note {print} same as {print $0} /dog|cat|bird/ {print} #or this #/dog/ || /cat/ || /bird/ {print} #Usage awk -f p5b.awk [fnames] #Program to print lines containing pattern dog #and print "bird line" for lines containing bird /dog/ {print} /bird/ {print "bird line"} e.g. p5c.awk: #Usage awk -f p5c.awk [fnames] #Program to print only lines containing all #patterns dog cat bird. /dog/ && /cat/ && /bird/ {print} #Usage awk -f p5d.awk [fnames] #Program to print only lines containing both #patterns cat and bird and has $1 < 4. /cat/ && /bird/ && $1 < 4 {print} FIELD SEPARATOR defaults to one or more spaces and/or tabs e.g. p6.awk: #Usage awk -f p6.awk [fnames] #Program to print only field 2 and 3 of each record #Note awk's Field Separator defaults to one or more # spaces and/or tabs #try this for stdin (note 2nd line has some tabs): #1 2 3 4 #1 2 3 4 {print $2,$3} This changes field separator to SINGLE comma: #Usage awk -f p6comma.awk [fnames] #Program to print only field 2 and 3 of each record #Field separator is changed to a single comma BEGIN {FS=","} {print $2,$3} For input file p6comma.in: aaa bbbb, cccc,dddd, e f g ,ABC,,DEF a b c ,d,e,,f,g,,h,i Get output: cccc dddd d e Field Separator can be any string, even a regex! FS="\t+" #one or more tabs FS="[[:space:]+]" #moot, since this is the default FS="a[0-9]*bc" #strings like abc, a8bc, a482bc, etc. Other Variables defined by awk: RS input records separator (default is newline) NF number of fields in current record FNR number of records read so far from current input file NR number of records read so far from all input files FILENAME name of current input file e.g. print first input line: #Usage awk -f p7line1.awk [fnames] #prints only line 1 of input and then stops processing NR==1{print;exit} Can use own variables in awk: e.g. print a specific line of input using own variable line: #Usage awk -f p7line3.awk [fnames] BEGIN {line=3;} NR==line{print;exit} Comparison Operations Include: == != < > <= >= String Operations: string~pattern true if string contains a match for regex pattern e.g., $2~/^abc/ true if $2 starts with "abc" $2!~/^abc/ true if $2 does not start with "abc" length(s) number of chars in s if use string in arithmetic expression, it is automatically converted to a number. If it does not make sense, converted to 0 e.g., #Usage awk -f p8.awk [fnames] #turning strings into numbers BEGIN { x=(4 + "26") + "hello" y=("4" "26") - "25" print "x:",x,"y:",y } > awk -f p8.awk x: 30 y: 401 > Printf: works like the C printf. e.g.: x=0 printf("ab %d cd %3.2f %d\n",5,2.1,x) prints: ab 5 cd 2.10 0 Can also use awk without an awkfile, by putting actions on the command line (protect with quotes). e.g., If try1 is: r1f1 r1f2 r1f3 r1f4 0 9 r2f1 r2f2 r2f3 r2f4 1 20 r3f1 r3f2 r3f3 r3f4 0 -5 then: /home/dwoit> awk '$5==0 && $6 < 8 {print $0}' try1 gives output: r3f1 r3f2 r3f3 r3f4 0 -5 e.g., awk 'BEGIN{x=0;printf("ab %d cd %3.2f %d\n",5,2.1,x)}' gives output ab 5 cd 2.10 0 Can use awk in Shell Programs: -But wait... what is $2 then? -does $2 refer to shell program`s second CLA, or awk`s second field? -it depends how you quote it, since shell evaluates "" and not '' (so '$2' gets passed to awk (awk`s $2) whereas "$2" gets substituted before getting passed to awk. e.g., #!/bin/bash #Source p20.sh #prints lines of file named out for which both of these are true # -line contains the string that is the shell program`s $1 # -second field of the line (awk`s $2) is less than X where X # is the shell program`s $2 var1=$1 var2=$2 #Either line below works OK. One uses variables var[12] and other uses $[12] #cat out | awk "/$var1/"' && $2 < '"$var2"' {print $0 }' cat out | awk "/$1/"' && $2 < '"$2"' {print $0 }' > cat p20.infile SAME 2 MEOW 5 WOOF 3 SAME 3 SAME 3 MEOW 5 MEOW 3 > p20.sh MEOW 4 MEOW 3 > Lots of other abilities: command-line arguments (like C) arrays if-else while-loops, for-loops The man-page for awk is a good reference HMWK: use awk to process a file where most records have the format: LastName FirstName SID a1 a2 t1 t2 where SID is 9-digit student-id-number; a1 and a2 are marks for assignments 1 and 2, each out of 15%; t1 and t2 are marks for tests 1 and 2, each out of 35%. Note that normal regex works in patterns, so to match $3 being 9 digits you can use: $3 ~ /^[0-9]{9}$/ and to match one or more digits: [0-9]+ Any records in the file not matching the above format are simply printed. Records that do match are also printed; however they are followed by the student`s final mark in the course (in percent.) Once all students have been processed, their course average should be printed. Modify the above to print an "average" line containing averages for a1, a2, t1, t2 and final mark. Modify the above to use an if-statement appropriately. Here is an example of one: if ($2=="F"){ print $1 "Fahrenheit" } else { print "there is no F in second field" } HMWK: what is a shell pipeline that uses history, awk, sort, uniq, head to list the 10 commands you use most? Useful options for sort are -r -n and for uniq -c. vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv Remaining is NOT IN FALL 2023 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <<< FILE DESCRIPTORS >>> ----------------- -indexes into "filedesc.table" (maintained by 0/S) - 1 table per process - up to 20 entries called file pointers (file descriptors) indexes into FD table - FD table: for each file opened: - how opened r/w/a - current position in file - ptr to file contents - when file opened, entry put in FD table - FD 0 stdin FD 1 stdout FD 2 stderr FD 3-20 can be associated with files by user - open FDs (including stdin) are inherited by child process before redirection: after: cmd file2 - close stdout (FD1) - open file file2 (generate ptr) - put ptr in slot 1 of FDT. to read file line by line and only print non-blank lines #!/bin/bash #Source: mycat #Usage: mycat 3outfile # shell pgm reads one line of file assd. with FD4 and # writes line twice to stdout & once to file assd. FD6 first=`head -1 <&4` echo $first >&1 echo $first >&6 echo $first exit 0 Can read 2 or more files at once: input files: addrs: phn: add1 ph1 add2 ph2 add3 output file: out: Name: n1 Address: add1 Phone: ph1 Name: n2 Address: add2 Phone: Ph2 Name: n3 Address: add3 Phone: #!/bin/bash #Source adr #reads from FD 3 & 4 and outputs to FD 5 #Usage: adr 3out #note could do read add <&3 instead of add=`line<&3` #NOTE remove comments from this if pasting while echo -n "Enter name: " read name #^d stops it do read add<&3` #stdin redirected so once read a line read ph<&4` #its gone, therefore 2nd time thru, #line 2 is read cat >&5 <out Enter name: n1 Enter name: n2 Enter name: n3 Enter name: n4 <--what does this to to the out file? <<< EXEC >>> Open files within a shell pgm: exec stmt #!/bin/bash #Source: addr1 #Usage: addr1 or # addr1 addrs or # addr1 addrs phn or # addr1 addrs phn out #if any args not specified, defaults are used exec 3<${1:-addrs} exec 4<${2:-phn} # if $1 has value, OK exec 5>${3:-out} # if $1 has no value, set it to addrs while read add<&3` read ph<&4` do echo " Name: $name" >&5 echo " Address: $add" >&5 echo " Phone: $ph" >&5 done exit 0 END OF WEEK 6 (UNIX) May do u6Lab now