Support scripts for bulk testing TMMS on a FAME cluster.

Supporting environment variables:
---------------------------------

NETNAME
	The virtual bridge aginst which nodes are PXE booted.

FAM
	The file that acts like global NVM, set it up with fallocate.

Commands (alphabetical)
-----------------------

attachnode
	Bulk starts of FAME nodes have their consoles attached to a
	GNU screen(1) session named <NETNAME>.  This command takes
	a number from 1-40 and attaches to that console.

lxc-fixdev DEPRECATED
	Used after starting an LXC container the first time to tweak a few
	things for FAME (especially libvirt bridges and vmdebootstrap)

octets2str
	Pass the client id octet stream from dhcpdump(1M) to see the
	ASCII string.

setnodes DEPRECATED; use 'tm-manifest setnode <list> or "all" manname
	Takes a list of nodes or the word "all" and a manifest name
	(previously uploaded via "tm-manifest put ...").   Bind the
	specified nodes to that manifest.  Waits until all nodes
	show a final "ready" status.

startnode.arm
	Takes a number from 1-40 and PXE boots that node into the
	FAME setup specified by the environment variable NETNAME.

startnodes
	Takes a list of numbers 1-40 or the word "all" and gang starts
	a PXE boot for each.  The console goes into a screen session
	called NETNAME with a window named nodeXX.  Automatically
	invokes waitnodes.

waitnodes
	Takes a list of numbers 1-40 or the word "all" and waits for
	all the nodes on NETNAME to respond to sshd probes.  This
	can be called manually but is usually automatically invoked
	by startnodes.

Helper directories
------------------

dotssh/
	Minimal config files for a $HOME/.ssh directory to support
	phrase-less ssh (log, direct command execution, or ansible).
	Needs the "l4tm_pubkey" specified in the active manifest
	of a given node.

manifests/
	Sample manifests.  The "bender.json" only specifies a single
	extra package but that loads about eight dependent packages.

Setting up a new FAME system or LXC container
---------------------------------------------

1. Install L4TM on a fresh system or clone an L4TM LXC container.

2. apt-get install python3 git libvirt-bin virsh tm-librarian jq screen

3. Obtain a copy of TMCF for your TM instance and put it at /etc/tmconfig

   An easy way is through the librarian.  Copy these lines to "fame40.ini":

   [global]
   node_count = 40
   book_size_bytes = 8M
   nvm_size_per_node = 512B
   
   Then run "book_register.py -j fame40.ini > /tmp/tmconfig"

   finally "sudo mv /tmp/tmconfig /etc"

4. Set up a virtual bridge for the nodes to share.  The easiest way is
   through libvirt/virsh.   

   a. Pick a network/bridge name.  My favorite is "fame_arm" but we'll call
      it YOURNET.
   b. Pick a subnet for the nodes.  It needs to be unique on your system.
      libvirt default has already claimed 192.168.122.0/24.  I suggest you
      use 10.10.10.0/254 or pick a different RFC 1918 network.
   c. Put this in a file called YOURNET.xml:

	<network>
  	<name>YOURNET</name>
  	<forward mode='nat'>
    		<nat>
      			<port start='1024' end='65535'/>
    		</nat>
  	</forward>
  	<bridge name='YOURNET' stp='off' delay='0'/>
  	<ip address='10.10.10.254' netmask='255.255.255.0' />
	</network>

      I recommend the last legal address of your chosen network as the
      IP address assigned to the bridge.  From the node's point of view
      this is the ToRMS IP address.

    d. Adjust "YOURNET" in your file and name, then 

       sudo virsh net-define YOURNET.xml
       sudo virsh net-autostart YOURNET
       sudo virsh net-start YOURNET

    e. The manifest_api will clean up dnsmasq, you don't need to worry
       about it any more.

    f. export NETNAME=YOURNET

    g. mkdir /etc/qemu

    h edit /etc/qemu/bridge.conf and add a single line:

      allow YOURNET

5. Obtain a copy of the manifesting service:

   a. As of 21 September I believe the repo is refreshed frequently
      enough to install from there.  If your designated host is L4TM,
      
      apt-get install tm-manifesting

   b. As an alternative, take it from the git repo:

      cd $HOME
      git clone https://github.hpe.com/hpelinux/manifesting.git

6. cd /where/ever/manifesting.  

   a. for apt-get install (step 5a) that's
   
      /usr/lib/python3/dist-packages/tm-manfesting

   b. for git clone (step 5b) that's wherever you put it, ie,

      $HOME/manifesting
   
   Follow the instructions in that README.  NOTE: as of 21 September that
   advice has not yet been verified against an apt-get install.
   
   As a result you should have an /etc/tmms file referencing your virtual
   bridge.

   The remaining instructions ass-u-me you use the suggested location
   /var/lib/tmms as MANIFESTING_ROOT in /etc/tmms.

7. a. In a fresh window start "sudo tm-manifest-server --verbose".  It will
      show a live access log.

   b. In another window, search for running dnsmasq processes.  You should
      see one using the config file /var/lib/tmms/dnsmasq/YOURNET.conf

      If you don't see that, your /etc/tmms PXE_INTERFACE does not match
      your host's network configuration.  Or something worse :-)

8. tm-manifest listnodes should reflect /etc/tmconfig

EXERCISING THE MANIFESTING SERVICE AND BOOTING FAME NODES
=========================================================

1. On your "torms" host, as root, edit /etc/resolv.conf to add a new line:

   nameserver w.x.y.z

   There can be up to three "nameserver" lines.  This one must be first.

   w.x.y.z needs to be the last legal IP address of the subnet you 
   specified in the TM_DOMAIN value of /etc/tmms.   If you used something
   like 10.10.10.0/24, the address you want is 10.10.10.254.   If you
   don't understand why, read about CIDR broadcast address and legal
   ranges.  Or apt-get install ipcalc and use it.

   If your host gets its address from DHCP, /etc/resolv.conf will get
   overwritten when it renews its lease.  You can prevent that with chattr
   and the immutable bit.  FYI.

2. tm-manifest put something.json

   There are sample manifest JSON files in 
   
   	manifesting/unittests/QAscripts/manifests

   "empty.json" is a nice safe place to boot your first set of nodes.

   Verify with "tm-manifest list"

3. Using FAM and LFS (Librarian File System)

   If you don't want to use /lfs (Librarian File System) then 
   
   	export FAM=NONE
   
   and go to the next step.

   To use FAME and LFS you need a file on your host for FAM.  If you use 8M
   books, 512 books per node, 40 nodes, that's 160G of space.  IVSHMEM has
   to be a power of two so you need a 256G file.

   a. $ fallocate -l 256G /var/cache/FAM

   b. export FAM=/var/cache/FAM

4. tm-manifest setnode all manname

    This will bind all nodes to the manifest named "manname".

    To wait for all the bindings to complete,

    tm-manifest waitnode all

5. startnode.arm 1

   The number is an arbitrary node.  This will take over the terminal window
   with the boot console.   Approximate elapsed sub-times:

   0 seconds		startnode.arm X

   15 more seconds	boot menu with default selected and countdown running

   30 more seconds	first output from downloaded kernel, ending with
   			message "Trying to unpack root file system"

   30 more seconds	More output as kernel boots up to networking

   10 more seconds	DHCP performed for kernel

   10 more seconds	Login prompt

   The only ways to release the terminal window:

   a. Log in as root/iforgot to the "node" at the prompt, then shutdown -h 0
   b. Open a new term window on the host and kill the qemu-system-aarch64

6. startnodes all

   Will start a "screen" session to hide all the consoles.   It will also
   enter "waitnodes" to let you know when they're ready.

   Once they're up you should be able to "ssh l4tm@nodeXX" as the l4tm
   user.  Password is iforgot, sudo is enabled.

   Or attach to the console (screen window) with "attachnode XX".
