Saturday, November 12, 2016

Automating small tasks


Doughnut Script
In life we do small tasks over and over, thinking, this is too small to automate. After all, it may take 10 minutes to automate it, and it only takes 10 seconds to do it.

Recently, I have been doing some testing on systemd, software for Linux. And I have been logging into the a DUT (Device Under Test) on a console cable. Using the console allows one to bring up and down the network interfaces, and still remain connected to the DUT.

It didn't take long to log in via the console, start screen, log into the DUT, then su to root, about 10 seconds total. Then, type the commands to run the test, log out, and quit screen.

I have been doing this for a couple of months, when I thought, hey, maybe I should sit down and automate this. Not because I was tired of logging in manually, but because I wanted to write a larger regression script for bugs I had raised against systemd.

Creating a doughnut script

So I spent 10 minutes writing and debugging a console login doughnut script*. Then I copied the script, and started adding my regression lines (in the middle). Since this is systemd, the regression test doesn't ever seem to pass, but at least I can quickly determine what is broken.

Because I am never sure the state of the DUT, the serial login script does prompt detection to determine the next step in logging in. It does this with:
# set user prompt
*/.*: /
# detect prompt
>>^M
>
+$prompt1=\n.*(\$ |login:|word:)

It determines prompts they same way you would with your eyes, presses <return> and "sees" what is returned. If it returns a prompt ($), then we are already logged in. If it asks for a login, then the script logs in.

Code Reuse

By setting the actual serial login in an include file, it is easily added to any script going forward. Suddenly the 10 minute time investment is paying off. But save yourself 10 minutes, and get the scripts from my github examples directory (look for serial_login.elt, serial_con.inc, and serial_discon.inc).

I find myself using the serial console script all the time, and it takes less than a second to log in. Automating little repetitive tasks can make life easier and faster, truly automation for the rest of us.

* a doughnut script is a script that sets things up, pauses (with *INTERACT), then breaks things down and cleans up.
** doughnut image by Evan-Amos - Own work, Public Domain, Link

Thursday, January 7, 2016

Demystifying Regex Redux with 9 simple terms

Can I have your number?
When I wrote demystifying regex with 7 simple terms a while ago, I left out a couple really useful regex terms. So I guess this would have to be re-written as 9 Simple Terms.

Regex is one of those things that when you need it, you need it. But it is from the 80's and cryptic. Most regex expressions you see are too complex, and hard to follow. In this post, I'll show you a couple more terms to help you keep it simple and to a minimum, while allowing you to tap the power of regex.

expect-lite has very good support for regex meta characters (the ones that start with a backslash "\"). As a quick review of the 7 terms, there are:
  • Repeats: * and +
  • Meta characters: \d, \w, \n, \t
  • Or: |
But there are a couple of regex meta characters which I have found useful in addition to the 7 above, when skipping over some columnar info to get that column you want to validate (or capture into a variable).
\s is whitespace (space, tab, or newline)
\S is not whitespace

Working with the example from demystifying regex with 7 simple terms:
$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0
0.0.0.0         10.1.1.1        0.0.0.0         UG    100    0        0 eth0


You could match the first IP address:
>route -n
<\d+.\d+.\d+.\d+ 


But what if you wanted to match the default route metric (on the last line) , rather than the first IP address, you could use (I have highlighted the value we are searching for in bold):
#validate metric
>route -n
<0.0.0.0\s+\d+.\d+.\d+.\d+\s+0.0.0.0\s+UG\s+100\s+\d\s+\d\s+eth0

This introduces a new meta character, \s which is a space (or any white space). But it starts to look complex. Keeping it simple, use the non-space \S, and back tracking from a known point (in this example 'eth0') will simplify the regex some:
#validate metric
>route -n
<100\s+\S+\s+\S+\s+eth0

It is better, but could be simpler by keying off of the flags column.
#validate metric of default route
>route -n
<UG\s+100.+eth0

This uses the space meta-character, \s, and uses another meta-character mentioned in the fine print of the original post, the dot, or '.' which matches any character. It is good to use the dot sparingly, as it can often match more than you would expect. But in this example, because there is only one default route (it is after all, only IPv4*), and it is always on the last line, it is pretty safe to use the dot.

To see just what expect-lite did match, use the *EXP_INFO directive on the CLI or earlier in your script.

Regex Guidelines

It is a good idea to keep regex as simple as possible. As we saw above, it is easy to create complex regex, but that leads to challenges in maintaining code later. Every time someone has to debug the script, they have to figure out what the regex is doing. Shorter, simpler regex will always win out.

Regex has the concept of anchors (^,$), but I haven't included them the 9 simple terms because of a couple of reasons:

  • Anchors don't work as you would expect in expect-lite. One would expect that you could use an anchor at the beginning of a line, but expect-lite doesn't evaluate output on a line by line basis, but rather a blob of text which includes new-lines. Therefore, if you need to "anchor" your regex, do something like '\n169.254.0.0' Regexs with anchors tend to not be simple or short
  • I have seen regexs where the entire line is described, from beginning of the line to the end of the line, with anchors at each end. This almost always makes a very complex and brittle regex. A change in the column width, can break these kinds of regexes. Rather, it is much easier, and less code intensive to do a sparse validation of output using simple regexes (as shown in the example above).


Not everyone is a regex expert. Plan on helping the next person who looks at your script by writing a comment about what the regex is doing. And if you are lucky enough to be the next person to look at your code, then you will be thankful that you wrote your future-self a note.

Recap the 9 terms

To recap, and give you a single place to look for a reference, the 9 terms are the single character meta-characters:

  • \d  is a number
  • \w  is a letter
  • \n  is a new line (think of it as a carriage return)
  • \t  is a tab
  • \s is white space (including \t and \n)
  • \S is a non-space (any letter, number, symbol)
  • . is any character (use this sparingly**)


And the repeat characters which are modifiers to the terms above:

  • *  repeats 0 or more times
  • +  repeats 1 or more times


And the regex OR term, |

The power of 9

You can still use only the original 7 regex terms and accomplish 90% of what you need. The additional 2 meta-characters just give you a bit more control over matching. And for those of us with a finite memory, it is still fewer than the fingers on two hands.


* IPv6 can often have multiple default routes, and the metric becomes very important in determining which one is used.
** the regex dot is extra credit