Thursday, June 20, 2013

Ansible overview - implementing ntp.


Ansible == control all now to go to the pub sooner.  

Or break it all in an instant and cry lots.  It's all up to you :)


s/chef/ansible/g

Why Ansible and not something else?

Automating sysadmin work is nothing new, we've all been doing that for ages.  Automation tools are nothing new either, and there are plenty around to choose from.  After a failed attempt at puppet i tried ansible after hearing about it at #lca2013.  I liked ansible because it felt as comfortable to use as ssh and vi, but had the power of idempotence, parallel execution, platform independence, operating system agnosticism, it was simple for me to understand and so on.  It might not work for you :) but if you do try it, please let me know how you get on.

So, ansible is a tool which lets you perform actions on servers.  They can be single actions, or multiple actions based on a definition called a playbook. You write a playbook (don’t worry, it’s not hard) where you define what needs to be done and you execute it with ansible. There’s no server-side software needed (well, except for python) and there’s hardly any configuration. By managing your servers with ansible you not only make sure their configuration is always identical and you always have the right tools installed, and you also have your server configuration documented at all times.

How to install ansible

Ansible is a python app, and it can be installed using the python software manager, "pip"; have a look at http://www.ansibleworks.com/docs/gettingstarted.html.

Architecture and authentication

Ansible works by connecting to your nodes via SSH (or locally) and pushing out small programs, called “Ansible Modules” to them. These programs are written to be resource models of the desired state of the system. Ansible then executes these modules over SSH, and deletes them when finished. Your library of modules can reside on any machine, and there are no servers, daemons, or databases required. Typically you’ll work with your favorite terminal program, a text editor, and probably a version control system to keep track of changes to your content.

Bootstrapping a host for ansible.

As mentioned earlier, ansible isn't a client/server model, in that it has no server daemon nor clients.  It just runs over ssh from your machine to the remote host.  The remote host sometimes need remote hosts may need python-simplejson installed.  They also need to allow ssh via ssh certificates, such as via:

$ ssh-copy-id remotehost
$ ssh-copy-id root@remotehost
$ ssh root@remotehost apt-get install python-simplejson

You can also have ansible use sudo instead of root@ however that option is not explored in this doco.  And Yes, somebody worked out how you can bootstrap a host for ansible, using ansible ..

Hosts definition

For ansible to work, you need a hosts definition. The default host definition is located in /etc/ansible/hosts. You can use a custom hosts definition (located outside /etc, for example) by defining them elsewhere and passing the -i [host-file location] parameter to the ansible-playbook command when running a playbook. We’re just going to work with /etc/ansible/hosts for now.

Now let’s open up the empty hosts file and put the following contents in it:
[kvmhosts]
thylacine
feathertail
bettong
kaluta
#lycosa

What we’ve done here, is define a group named ‘kvmhosts’. A group can be used in a playbook to run the playbook against a number of hosts; you can't run ansible against a host if it's not mentioned in this file.  We’ve added some hosts to that group by their hostname, but you can use IP addresses as well; either way they need to be reachable and bootstrapped as discussed earlier. You can also define just one host here and you don’t even need a group. Multiple groups are OK too, of course.  Even groups of groups.  It’s all up to you and it all happens in this file.

Task execution with ./ansible

So lets say we want to make sure all our kvmhosts have the correct time.  How can we check the time on all the hosts at once? we can run ansible with the command module, call date and see what is returned;

$ ansible kvmhosts -a 'date'
bettong | success | rc=0 >>
Mon Jun 17 11:42:34 EST 2013

kvmtest | success | rc=0 >>
Mon Jun 17 10:46:02 EST 2013

feathertail | success | rc=0 >>
Mon Jun 17 11:46:02 EST 2013

thylacine | success | rc=0 >>
Mon Jun 17 10:44:02 EST 2013

kaluta | success | rc=0 >>
Mon Jun 17 11:46:11 EST 2013

Notice we didn't actually say "run the command module"?  if we don't specify a module name explicitly, then ansible assumes that's the one you want; but there are dozens of others.  If you don't specify a user to connect as, then ansible will assume you mean to connect as yourself.

Ansible modules

So there is the command module (like we used above) which just allows us to run 'date' against a group, but what if we want to do something more interesting?  enter ansible modules.  Don't worry though, you don't need to know how they work internally, just that they just do, and there's lots of them at http://www.ansibleworks.com/docs/modules.Sohtml.

How about we want to make sure the httpd is running on all our webservers?  easy;
$ ansible webservers -u root -m service -a "name=httpd state=running"

Or that htop is installed on all your debian machines;
$ ansible debianservers -u root -m apt -a "pkg:htop state=present"

Or create a symlink;
$ ansible debianservers -u root -m src=/file/to/link/to dest=/path/to/symlink owner=foo group=foo state=link

You can do any *single task* you like in this way.  There is no program control or audit or any real repeatability in this mode, but sometimes this is all you might need.

Writing your first playbook

Obviously however, doing anything significant takes more than one task   In a playbook you define what "task"ansible should take to get the server in the state you want. Only what you define gets done.  In ansible, all tasks are idempotent which means that they only change what need to be changed and otherwise do nothing.  So you can run a playbook against a host repeatedly, and after the first iteration, nothing will be changed.  Or add a host to a group, run the playbook against the group, and only the new host gets configured etc.

Playbooks are executed with ./ansible-playbook.  Playbooks are written in YAML which means they are fussy about indenting and syntax and their special markers.  it's a bit annoying.

Making NTP work 

Lets imagine the time on all our kvm hosts is wrong.  We want to use ansible to make sure they all have NTP installed and that it's configured sanely and running etc.  Doing this manually would take a good number of steps which has to be repeated on each host; when we have multiple tasks like this, that's when we write an ansible playbook. When we get a new host, or change the config somehow, we just run it again and voila! it's done.

So lets imagine that we need to complete the following in order to get NTP working on a debian host;

  1. Do an apt-get update.
  2. Install the ntp package
  3. Unless this is called timelord, configure NTP to point to timelord
  4. Configure the timezone files appropriately.
  5. Restart NTP
  6. Wait awhile and check to see it's working OK.

For any given host, if any of these steps fail then we'd want to stop configuring it to see any error message(s) etc.

ntp.yml

An ansible playbook to complete those steps looks largely like a script written in plain english.

If you have a look below, you can see that this playbook is able to be run against all hosts, and will connect to each of them as the root user.

##
# Demo Ansible playbook for installing and configuring NTP
#
---
- hosts: all
  name: install, configure and start NTP
  user: root
  tasks:
    # Ensure the ntp package is installed
    - name: Ensure the ntp package is installed      
      action: apt update_cache=yes pkg=ntp state=installed

    # Our local time server is called timelord.  all servers need to point to that.
    - name: EXCLUDING timelord, copy over our local ntp.conf
      when_string: $ansible_hostname != 'timelord' 
      action: copy src=files/etc-ntp.conf dest=/etc/ntp.conf mode=755
    # Update /etc/localtime to point to files/usr-share-zoneinfo-Australia-Hobart
    - name: Update timezone link foo
      action: copy src=files/usr-share-zoneinfo-Australia-Hobart dest=/etc/localtime mode=755
    # Update /etc/timezone to show Australia/Hobart
    - name: Update timezone link foo
      action: copy src=files/etc-timezone-hobart dest=/etc/timezone mode=755
    # Restart ntp to apply new ntp.conf
    - name: Restart ntp to apply new ntp.conf
      action: service name=ntp state=restarted

And it is executed simply running;
$ ansible-playbook ntp.yml --limit kvmhosts

If a task fails on a given host, then execution of the remaining tasks on that host would be stopped.  You would be able to see what happened and fix it etc.

And finally, check your work and make sure NTP is indeed up and running by using ansible in task execution mode like we did earlier;

$ ansible kvmhosts -m command -a 'date'
bettong | success | rc=0 >>
Mon Jun 17 14:22:20 EST 2013

kaluta | success | rc=0 >>
Mon Jun 17 14:22:20 EST 2013

kvmtest | success | rc=0 >>
Mon Jun 17 14:22:20 EST 2013

thylacine | success | rc=0 >>
Mon Jun 17 14:22:20 EST 2013

feathertail | success | rc=0 >>
Mon Jun 17 14:22:20 EST 2013

Facts and logic and all the rest of it

This document shows the basics of ansible, and doesn't explore some of the more interesting aspects, such as conditional processing (ie when_string: $ansible_hostname != 'timelord'), or how to manage playbook execution between different host groups (hosts: all:!depricatedmachines) and so on.

There is alot of documentation on the ansible website, so if you're tempted, then please do have a look.

Internet resources and thanks to;


Tuesday, April 16, 2013

ABC Dig Radio now playing list

Dig. The link between the music i love, and the wide wide world of amazing.

They play awesome songs and i love it, and part of that awesome is the total absence of DJ's rabbiting on all day about .  A side effect however is that there is nobody to mention who just played that awesome song you just heard.  Enter the ABC dig music now playing list.  There are lots of ways to get it;
  1. Straight off their homepage ;) http://abcdigmusic.net.au/
  2. From their mobile app
  3. From the @ABCDigMusicNow twitter feed
  4. Or from any number of tools which parse http://abcdigmusic.net.au/player-data.php
But I couldn't find a command line tool to do the job, so i wrote one in python.  It's pretty basic, and looks like this;
$ ./abc-dig-music.py 
Now: 'When A Woman Loves A Man' by Paul Kelly from 'Spring And Fall' (2012) on Univeral Music
Was: 'You Come Through' by PJ Harvey from 'Uh Huh Her' (2004) on Island/UMA
But I can't claim much of it is my own work, I'm just capitalizing on the good work already done by Ian Wienand in his Dig Jazz Applet.  But like all awesome people, he GPLd his code :)  thanks mate.

Thursday, March 07, 2013

Return codes from system commands in Perl

So say I'm running a Perl script which then calls some system command, let's say 'echo blah'.  Running that returns three things to us;
  1. the actual output (stdout) which would be "blah",
  2. any error output (stderr) which would be "" ie; NULL or empty,
  3. and the return code which would be "0";
So until now I had to choose which one of those i care about most and use the appropriate system executable handler to suit; let me explain;

my $result = `echo blah`; # captures "blah" into $result
my $result = system("echo blah"); # captures the return value of echo into $result
..and i stopped paying attention about there.

Suffice to say, no method i knew about allowed me to capture them all at the same time. Until i discovered IO::CaptureOutput.  Check out this foo;
$ cat ./perl.system.calls.sh 
#!/usr/bin/perl
use strict;
use IO::CaptureOutput qw/capture_exec/;

my @cmd= "echo blah";

my ($stdout, $stderr, $success, $exit_code) = capture_exec( @cmd ); chomp($stdout);

print "running '@cmd'\n";
print "stdout = $stdout\n";
print "stderr = $stderr\n";
print "success = $success\n";
print "exit_code = $exit_code\n";

$ ./perl.system.calls.sh 
running 'echo blah'
stdout = blah
stderr = 
success = 1
exit_code = 0

So now i can trap these in my code and handle them each as i see fit.  It means better error handling and software that runs (and stops!) how and when one might want it to.  And that's cool.

Wednesday, March 06, 2013

Migrating VMs on KVM without shared storage

So you have some virtual machine hosts running the Linux virtualisation software KVM on a current release of Ubuntu.  One of the hosts needs some urgent maintenance requiring an outage soon, but that host has several business critical virtual machines on it.  So you need to migrate those vms to the other host.

That's cool though because virsh allows us to migrate virtual machines to different hosts, assuming they are sharing the same storage pool.  But our hosts aren't.  Ahh but the current version of virsh allows us to migrate virtual machines which aren't using shared storage.  Sweet.  But is an even reasonably recent version of virsh is available for Ubuntu?  No.  Poop.  Compile the new kvm from source?  Migrate to Debian over the weekend?  Spend zillions on vmware?  Ummmmmm.. no, no and..  no (funnily enough).

But we can do custom hack foo code (and so can you)!! :)

$ ./manage.vm.sh
This command needs to be run as root (use sudo)

$ sudo ./manage.vm.sh
usage: manage.vm.sh list: List all guests on the current host
usage: manage.vm.sh backup vmname: Shutdown, backup & restart a guest
usage: manage.vm.sh migrate vmname destination host="": Shutdown, copy, define remote, destroy local & startup remote

$ sudo ./manage.vm.sh migrate ansible feathertail
Considering migrating ansible to feathertail
 ..testing to see if we have the correct permissions on the remote host
 ..yay! connectivity with remote host established
Parsing KVM configuration for ansible
 ..assuming config file is at /etc/libvirt/qemu/ansible.xml
 ..attempting to determine datafile location
 ..ansible is configured to use /var/lib/libvirt/images/ansible/tmpfyi1SJ.qcow2
Checking to see if the guest is running
  !!WARNING   guest is still running, initiating shutdown.  Is an outage OK? .. ..enter L8 to verify:  L8
 ..acknowledging verification commencified
 ..shutting down, please wait  :) :) :) ansible now not running

Are you sure you want to proceed migrating ansible to feathertail? ..enter Q2 to verify: Q2

..and so on.  It actually works.  Does cool stuff like;
  • Automatically parses config files from /etc/libvirt/qemu/
  • Will migrate a vm with multiple vm datafiles
  • Renames local .xml and data files before undefining the vm.  Read; roll-back.
  • Prompts before shutting down a vm and again before migrating it.
  • Copies .xml and data file(s) remotely via rsync.
  • Defines the vm on the remote host and starts it.
Anyhow have a play and if you like it or have feedback or whatever, please let me know.

Logging the connection status of your ADSL router

Awhile back learning perl coincided with some issues i had with my ADSL.

  • Random disconnects
  • Unknown uptimes nor connection speeds
  • Unreliable connection speeds
  • Unable to reset the device via the command line
  • And a general feeling of being ripped off by the ISP.

So i hit it with the Perl hammer and produced 24k monolith which solves all that and more but will probably never work for anyone else :)  If that sounds like a challenge, then have a look at adsl-status on github.

$ ./adsl-status.sh
ADSL2 synced @~ 657KB/s (avg 662, max 747). Up 24238 mins (avg 8562, max 89341); 16 day 19 hr 57 min 21 sec.

$ ./adsl-status.sh --help
usage: adsl.sync[.dev].sh [--verbose|--silent|--help|--debug] | [--reset]

$ ./adsl-status.sh --verbose
verbose, adsl: clean_text_split = Annex Type ADSL2 -- Upstream 967600 -- Downstream 5374560 -- Elaspsed Time 16 day 19 hr 59 min 14 sec

verbose, adsl: adsl_annex_type = ADSL2
verbose, adsl: connection_uptime_in_seconds = 1454354
verbose, adsl: down_bits_per_second / up_bits_per_second (bps) = 5374560/967600
verbose, adsl: down_kilobits_per_second / up_kilobits_per_second (kbs) = 5375/968
verbose, adsl: down_megabits_per_second / up_megabits_per_second (mbs) ~ 5.5/1
verbose, adsl: down_kilobytes_per_second / up_kilobytes_per_second (KB/s) ~ 657/119
yada yada..

Starting out with github on Ubuntu

OK.  Ten mile high view.  Sometimes I write code, but alot of the code I write is too monolithic to post to my blog.  Seems to me I should use some kind of publicly accessible revision control system to check-in the code, and then link to that from my blog.  We already use subversion at work and some of my code lives there but allowing public access to that makes no sense.  Sometime code lives at home and that never even sees the work SVN system.  What would be cool is some system which allows me to control code from wherever i am, and yet allow public access to anyone.  Enter GIT and Github.

So if all the cool kids are using GIT, how do we get it installed, running and working so i can actually link to it?  Github have a very nice help system and i used that to do what i have done here;
  • Installed git on my ubuntu machine
  • created an account on github
  • logged into github
  • on github created a new repository called hooliowobbits/testing
  • on my machine i ran
    $ git clone https://github.com/hooliowobbits/testing.git
    Cloning into 'testing'...
    remote: Counting objects: 3, done.
    remote: Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    $ cd testing/
    $ ls -lha 
    total 8.0K
    drwxr-xr-x 8 hoolio datarw 4.0K Mar  6 11:39 .git
    -rw-r--r-- 1 hoolio datarw   28 Mar  6 11:39 README.md
    $ cat README.md
    testing
    =======
    sandbox foo
    $ echo blah > blah.txt
    $ git add blah.txt
    $ git commit blah.txt
    [master 5f1522c] this is just a note i added when i first typed git commit blah
     1 file changed, 1 insertion(+)
     create mode 100644 blah.txt
    $ git push
    Username for 'https://github.com': hooliowobbits
    Password for 'https://hooliowobbits@github.com':
    To https://github.com/hooliowobbits/testing.git
       2ebbe5e..5f1522c  master -> master
    
    
  • Then i visited github again and i see my blah.txt sitting there :) 
Right now I can't presume to know much more about GIT and Github than this; but clearly this opens up a world of possibilites.  Suddenly with one command on my machine the world can see my code, use it, can comment on it, fork it whatever.

Now, let's code!

Internode quota from the Ubuntu command line

I have an Internode ADSL connection and I'm running Ubuntu Server (12.04). I wanted to be able to check my internet quota from the command line and i found a perl script to help do that, but there weren't quite enough instructions there for me to make it work. It wasn't too hard to fix though:

$ wget http://zwitterion.org/software/internode-quota-check/internode-quota-check
$ chmod +x internode-quota-check
$ mv internode-quota-check internode-quota-check.sh
$ sudo apt-get install libwww-mechanize-perl libreadonly-perl
$
$ ./internode-quota-check.sh man
you don't seem to have a ~/.fetchmailrc, so I'll prompt you.
To avoid extra dependencies, your password will be echoed.
Username: juliusroberts
Password: passwordhere
Run this command to create a ~/.fetchmailrc file:
echo '# poll mail.internode.on.net user "juliusroberts" password "passwordhere"' >> ~/.fetchmailrc
$ echo '# poll mail.internode.on.net user "juliusroberts" password "passwordhere"'>> ~/.fetchmailrc
$
$ ./internode-quota-check.sh
juliusroberts: 132.032 GB (88.0%) and 13.6 days (48.5%) left on 150 GB, 24 Mb/s plan.
$

So I then went and added  ./internode-quota-check.sh to the bottom of my ~/.bashrc file.  So now when i login to my server i see straight away how much internets i have left, yay :)

Monday, March 04, 2013

Use perl to check spamhaus status in Nagios

We had an issue where something on our internal network was tripping a SMTP spam filter at spamhaus.org.  We thought we fixed it once only to be bitten again a few months later and payslips from our payrol system were bouncing (bad).  As well as actually investigating the root cause, we created a nagios check to check spamhaus programatically.  Creating a custom nagios check is well documented on the nagios website.

#!/usr/bin/perl
#
# Quick perl script to check spamhaus to see if we're blocked again, see https://rt.wilderness.org.au:444/rt/Ticket/Display.html?id=73259
#
# This script returns values consistent with the nagios return code specification at http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN76
# 0     OK          The plugin was able to check the service and it appeared to be functioning properly
# 1     Warning     The plugin was able to check the service, but it appeared to be above some "warning" threshold or did not appear to be working properly
# 2     Critical    The plugin detected that either the service was not running or it was above some "critical" threshold
# 3     Unknown     Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin 
#                       (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors 
#                       (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states.

use strict;
my $our_external_ip = "xxx.xxx.xxx.xxx"; 
my $exit_value=3;

# run the wget command and save it's output to the $results variable.
my $results=`/usr/bin/wget --random-wait -U mozilla -O /tmp/spamcheck.dat -o /tmp/spamcheck.log http://www.spamhaus.org/query/ip/$our_external_ip && grep $our_external_ip /tmp/spamcheck.dat`; 
chomp($results); 
#print $results."\n";

# check to see if we score RED ie; we're on a blocklist
if ($results =~ m/red/) {
    print "ALERT: Block found.  Check http://www.spamhaus.org/query/ip/$our_external_ip\n";
    $exit_value=2;
} 

# if we got this far, we saw no RED.  Check to see we at least get one GREEN.
elsif ($results =~ m/green/) {
    print "OK; We got a green and no red.\n";
    $exit_value=0; 
} 

# ALERT: We got no red AND no green; therefore there bust be some issue somewhere!
else {
    print "ALERT: No valid return codes detected; page-load/dns/internet/scripting issue?\n";
    $exit_value=3; 
}

#print "\nPerl hopes it's returning a \$exit_value of $exit_value\n\n";
exit $exit_value;

note that /tmp/spamcheck.* will need to be globally writable.

That then results in a nice web gui telling us all isn't well (again).  Happy with my work i told my boss who said "i don't like it, i want it to say OK", to which i replied, "i can do that ;)", and now you can too.  Happy automation :)




Thursday, February 28, 2013

All your servers are belong to Ansible.

So say you've got a few *NIX servers of various flavors, a dozen or two; it takes a day or so to add a new one to production, installing ntp, configuring your custom software repositories, configuring the various accounts it might need, installing ssh, adding it to the backup system, to the monitoring system etc etc.  Say you're getting over it.  Say you need to change your software repositories or your admin ssh keys.  Familiar?  enter server automation:


We'd heard about chef, and tried puppet; the uuber configuration management system which is great, so they say; on the 3rd or 4th incarnation :)  We wanted something a bit simpler something that avoided the monolithic client/server model, could be run anywhere (with git) and which used the SSH key auth we were already using.  It had to be able to manage groups of machines in a logical "idempotent" way.  Idempotent means you can apply a play which says "make it like thus" and if nothing needs to change, nothing is changed.  You can apply it again (and again) and not break anything.

So anyhow we found all of that in a free open source software project called Ansible;

"Orchestrate From Above.
Most software does not run on a single machine.
Ansible parallelizes complex multi-tier rollouts across app servers, databases, monitoring servers, and load balancers.."

after following the doco I had it up and running managing NTP on 25 servers within a day.  A good percentage of that was spent sorting out root ssh access (although sudo is ok too) and finding out what NTP is actually packaged as on centos vs ubuntu vs debian etc.

We've now moved on to managing users and ssh keys with Ansible and i can see this making a very significant difference for us.