Sunday, March 24, 2013

Installing Git and Gitolite on a Raspberry PI

If you want to give a try to Git @home (and are not interested by GitHub), installing Gitolite on a Raspberry PI is one of the best option you have, even better than installing git on a Synology NAS (in the case you do not want to tweak your Synology device).
A Raspberry PI in a plastic case.

All you have to do is to:
  • order a Raspberry PI model B (less than € 35) and some accessories. Here you have a list of compatible items you can use with a PI. Additionally to my PI, I bought :
    • an USB power supply (~ € 7-12. Beware; output must be at least 1A)
    • a clear moulded plastic case to house the PI  (~ € 6)
    • a Transcend TS32GSDHC10E  32 GB SDHC class 10 memory card (~ € 24. Could be less sized. Initially I wanted to have enough space to host an artifacts repository)
    • an Edimax EW-7811UN USB Nano adapter Wireless 150 Mbps (~ 12 €. required only if you want to place your PI out of reach of an Ethernet plug)
  • allow ssh access to your PI (gitolite or not, this is something you will do during the first setup under Raspbian Wheezy ...)
  • install git and perl on your PI using the usual apt-get install commands
  • useradd a git account
  • follow the installation instructions of Gitolite as you can found them on the GitHub repository. 
Gitolite enables you to setup a server of  git repositories on a dedicated host. Access control is based on user's public SSH keys, and access to each repository can be controlled per user or group of users.

There is no need for a GUI to administrate a Gitolite server; a particular git repository, gitolite-admin (that you have to clone on your workstation as any other repository) is used to create and configure repositories, and to grant rights to users.

The initial conf/gitolite.conf file of the gitolite-admin project gives at the first sight an idea of the main principle of gitolite:

repo gitolite-admin
    RW+     =   bob

repo testing
    RW+     =   @all

You are bob, you have installed gitolite on the PI (and as you give your public key bob.pub  during installation, your are granted to administrate the gitolite-admin repository). Every known user (@all) is granted to use git-fully the repository testing.

If you want to add a new repository named myrepo1 to your gitolite server, you just have to clone the gitolite-admin repository on your development workstation, and add the following lines to conf/gitolite.conf:

repo myrepo1
    RW+     =   bob
    R       =   alice

Alice being a new "read only" player, you also have to add her public key named alice.pub into the keydir directory of the gitolite-admin project.

Add, commit, and push; et voilà ! Your new repository myrepo1 is  clonable by Bob and Alice.

Instructions for deleting a repository can be found here.

Even if you do not use your Synology device to host a git server, you can still use it to backup your Raspberry's Gitolite repositories with rsync. This is another story ...




References:



Saturday, March 23, 2013

Environment variables and Maven

Recently, due to security restrictions, like others, I lost the VPN access to my office.

I was  still able to access to our source code repository from the outside, but VPN provided me a remote access to our internal artifacts repository (Apache Archiva), and thus the same conditions to develop remotely  (compile, deploy, release) that I have at work.

The workaround I found to continue to work nearly as before was the following:

1) Install Archiva on my workstation. It was a good occasion for me to use Puppet for that.

2) Import (tar cf / xf) artifacts required (not rebuildable, what I call "postulate artifacts") in this Archiva instance.

3) Use environment variables pointing to an artifact repository in my setup.xml (local to each development environment) and in all of our projects pom.xml (in SVN). For instance:




Obviously, A_HOST and A_PORT must be defined in all development environments (cli, Maven, Jenkins, IDEs) , but this is not a big deal. 

The only thing I lost comparing to the VPN way was the access to THE company repository, allowing me to deploy or release versions remotely, but this is no surprise ...

Monday, March 11, 2013

Installing ElasticSearch using Puppet

Installing ElasticSearch using Puppet 

I am currently working on two great technologies: Puppet and ElasticSearch.

The purpose of this post is to describe a Puppet module which deploys ElasticSearch and selected plugins as a service in a Linux or Mac-OS environments.
You should try ElasticSearch ...

I am not an expert of these technologies; what I describe here is just a snapshot of what I am able to do at a given point in time (2013, March) of my Puppet & ElasticSearch learning curve...

This kind of work has already been made a lot of time, and it's my turn !

Limitations

This module has been tested against OpenSuse 12.2 and Mac-OS 10.6.8, ElasticSearch 0.20.5 and the latest version of the ElasticSearch service wrapper as I found it on GitHub around mid-february 2013.

I tested the ElasticSearch resulting installations in the following ways:
  • batch indexing on the OpenSuse (master) node (180 millions of documents distributed over 8 shards)
  • replication on Mac-OS nodes (puppetized or not)
  • search on all types of covered nodes
  • usage of the head plugin
  • upgrading / downgrading in place with 0.20.5 and 0.20.6 versions of Elasticsearch
No more warranties, this is still unfinished work !

Context

  • The Puppet Master is hosted on a Raspberry PI model B running Raspbian Wheezy (Puppet: 2.7.18-2, Ruby: 1.8.7). No compilation of Puppet from source code; I have used what is coming with the distribution (the same stands for OpenSuse 12.2). Modules are managed using git, thank's to Git and my Synology Diskstation.
  • Puppet (2.7.18) and Facter (1.6.11) on Mac-OS come from the Mac-OS download puppet site. It is on purpose that I did not used the latest versions available in each case: my intention was to keep the puppet agents at the same level than the master. The installation of these disk images is not sufficient enough to launch the puppet agent at Mac-OS start-up; do not forget to read this.
  • The version of the puppet agent on OpenSuse 12.2 is 2.7.6-4.1.2. It runs under Ruby 1.9.3-2.2.1.  Talking about Ruby 1.9.3 and Puppet, I remember that this post helped me a lot to get my client cert signed by the puppet master. 
I will not describe here how to setup the Puppet Master and its agents, as this topic is largely covered by multiple resources on the Internet.

Prerequisite

ElasticSearch and the service wrapper zip binaries must be located under the files/elasticsearch mount point of the Puppet Master (the service wrapper adds the multi-platforms service functionality to ElasticSearch, thank's to Tanuki Software which is also used (and embedded) to my knowledge in Apache Archiva).

/etc/puppet/fileserver.conf must contain a section resembling to:

[files]
  path /etc/puppet/files
  allow *.your.domain

Under the path /etc/puppet/files, there should be a directory for the ElasticSearch module containing the following files (wrapper.zip being a zip archive built using the git archive command and elasticsearch-VERSION.zip being the version of ElasticSearch that has to be puppet distributed):

/etc/puppet/files
└── elasticsearch
    ├── elasticsearch-0.20.5.zip
    └── wrapper.zip

Module parameters


This module is implemented as a single puppet class with the following parameters:
  • els_username which defaults to elasticsearch, is the Unix user name used to run the ElasticSearch node.
  • els_uid is the UID of els_username.
  • els_groupname which defaults to elasticsearch, is the Unix group name of els_username.
  • els_gid is the GID of els_groupname.
  • els_homedir is the directory under which ElasticSearch will be deployed.
  • els_env is the path where the template env will be created. The resulting file aims at gathering environment variables and system settings. It is sourced by the service wrapper script (see bellow).
  • els_version is the version of ElasticSearch that has to be installed. There must be an existing elasticsearch-<els_version>.zip file under the /etc/puppet/files/elasticsearch directory (see pre-requisites above).
  • els_clustername is the ElasticSearch cluster name of the node (all the nodes with the same cluster name belongs to the same cluster).
  • els_nodename is the ElasticSearch node name. 
  • els_masters is an array of ElasticSearch masters (see bellow an explanation of why I did not use multicasting which is the default for ElasticSearch).
  • els_node2node_port which defaults to 9300, is the port number used by ElasticSearch  nodes to communicate between them.
  • els_http_port which defaults to 9200, is the port number used to communicate with the cluster (eg: REST API, site plugins, ...).
  • els_is_master which defaults to false, indicates if the node is a master node. 
  • els_is_data which defaults to true, indicates if the node is a data node. 
  • els_nb_of_shards which defaults to 5, is the default number of shards.
  • els_nb_of_replicas which defaults to 1,is the default number of replicas for each shard.
  • els_heap_size which defaults to 1024,is the heap size of the JVM of the ElasticSearch server. Units are in mega bytes, and usual letters (G,M) must not be used as they generate an error with the service wrapper script. 
  • els_plugins which defaults to [], is an array of the plugins which must be installed on each node.  Each plugin is described by a space separated string: the first part of the string is the name of the plugin as it is passed as a parameter to the bin/plugin ElasticSearch shell script for installation, the second part of the string is the name of the directory which is created under the ElasticSearch plugins directory after the installation of the plugin is completed. This trick is used to check that the plugin has already been installed. Example of such string: mobz/elasticsearch-head head.
  • els_ensure_running which defaults to false, indicates if Puppet should verify that ElasticSearch is running as a service.  

This class is able to update ElasticSearch just by changing the value of the els_version parameter on the puppet master node file , as far as no more than a service restart is required (and obviously after having downloaded into the /etc/puppet/files/elasticsearch puppet master directory the corresponding binary zip file).

Note that plugins are not updatable nor removable. They can just be installed. It should not be so difficult to make them more puppet dynamic, for instance by adding an action sign before the plugin name (! to force reinstall, - to remove, ...) and modifying the install_plugin define (see bellow).

Usage example


node 'mini.your.domain' {
   class { 'elasticsearch' :
      els_uid => 1963,
      els_gid => 1963,
      els_heap_size => 1536,
      els_homedir => "/Users/elasticsearch",
      els_env => "/Users/elasticsearch/elasticsearch.env.sh",
      els_version => "0.20.5",
      els_nodename => "elasticsearch@$fqdn",
      els_clustername => "mycluster",
      els_masters => [ 'mahina.your.domain' ],
      els_nb_of_shards => 8,
      els_ensure_running => true,
      els_plugins => [
         'mobz/elasticsearch-head head',
         'elasticsearch/elasticsearch-mapper-attachments/1.6.0 mapper-attachments'
      ],
   }
}


The module's files

The module is made up of an init.pp file and three templates (one for the ElasticSearch configuration, one for installing it as a MacOS service, and the last one for hosting environment variables used by the service wrapper).


manifests/init.pp

This is the main file. It creates a tree that will look like:

homedir
├── downloads
│   ├── elasticsearch-0.20.5.zip
│   └── wrapper.zip
├── elasticsearch-0.20.5
│   ├── bin
│   │   ├── elasticsearch
│   │   ├── elasticsearch.bat
│   │   ├── elasticsearch.in.sh
│   │   ├── plugin
│   │   ├── plugin.bat
│   │   └── service
│   │       ├── elasticsearch
│   │       ├── ...
│   ├── ... 
│   ├── config
│   │   ├── elasticsearch.yml
│   ├── lib 
│   │   ├── elasticsearch-0.20.5.jar
│   │   ├── ... 
│   ├── plugins
│   │   ├── head
│   │   │   └── ... 
│   │   └── mapper-attachments
│   │       ├── ... 
├── elasticsearch.env.sh
├── elasticsearch_content
│   ├── data
│   ├── log
│   └── piddir 
└── elasticsearch_current -> /home/elasticsearch/elasticsearch-0.20.5


First of all, the group and the user under which ElasticSearch will run are created if they are not present in the target node. The same stands for the home directory.

Under the home directory, a download directory is created. This directory will be used to store  the downloaded files from the puppet master fileserver.

An elasticsearch_content directory is also created. It will be used to store all the content of a running node (shards, log files, ...), independently of the version of ElasticSearch itself. Under this directory, a piddir directory is created, it will be used by the service wrapper to store its status independently of the version of ElasticSearch it is bind to.

Then the ElasticSearch zip file is downloaded and stored into the download directory. It is unzipped under the home directory using its zip native name: elasticsearch-<els_version> (e.g.: elasticsearch-0.20.5).

Permissions of the ElasticSearch unzipped directory (and its sub-directories) are fixed thanks to Perl and find. Just a word about that: I think that Perl is a good match with puppet for Unixes boxes, as it is portable for simple things, far more portable than sed for instance for the same kind of tasks.

A link is established between elasticsearch-<els_version> and elasticsearch_current.

Note that prefixing content and current directories with elasticsearch allows this class to deploy ElasticSearch with other middlewares under the same home directories using the same deployment pattern (current/content).

Now that an ElasticSearch version is available, specified plugins are installed using the ElasticSearch plugin script (see the define install_plugin).

The ElasticSearch configuration file (elasticsearch.yml) is then created using the Puppet templating mechanism.

Next, the service wrapper is downloaded from the Puppet master, unzipped at the right place (under the bin directory of the ElasticSearch installation), and the service file is patched (thanks to Perl) to include the environment file that will be created at the next step, and to set the proper values for PIDDIR and ES_HOME. These two variables must not be left to their default value in order to properly restart ElasticSearch when a new version is puppet-installed.

The final step is dedicated to the service installation. Depending on the kind of target node, a plist file is installed (MacOS) or a link is set under /etc/init.d (Linux).

In both cases, the service is installed, thanks to Puppet.

class elasticsearch (
   $els_username = elasticsearch,
   $els_uid,
   $els_groupname = elasticsearch,
   $els_gid,
   $els_homedir,
   $els_env,
   $els_version,
   $els_clustername,
   $els_nodename,
   $els_masters,
   $els_node2node_port = 9300,
   $els_http_port = 9200,
   $els_is_master = false,
   $els_is_data = true,
   $els_nb_of_shards = 5,
   $els_nb_of_replicas = 1,
   $els_heap_size = 1024,
   $els_plugins = [],
   $els_ensure_running = false
) {
   $els_name="elasticsearch-${els_version}"
   $els_base="$els_homedir/$els_name"
   $els_current="$els_homedir/elasticsearch_current"
   $els_content="$els_homedir/elasticsearch_content"
   $els_downloads="$els_homedir/downloads"
   $els_datadir="$els_content/data"
   $els_workdir="$els_content/work"
   $els_logdir="$els_content/log"
   $els_piddir="$els_content/piddir"
   $els_wrapper_script="$els_current/bin/service/elasticsearch"

   if ($operatingsystem == "Darwin") {
      $els_notify="Service[org.tanukisoftware.wrapper.elasticsearch]"
   } else {
      $els_notify="Service[elasticsearch]"
   }

   File { 
      owner   => $els_username,
      group   => $els_groupname, 
      mode    => '0644',
   }

   Exec {
      user    => $els_username,
      group   => $els_groupname,
      cwd     => $els_homedir,
      path    => "/usr/bin/:/bin",
      timeout => 900,
   } 

   define install_plugin {
      $p_array=split($title,' ')
      $p_name=$p_array[0]
      $p_dir=$p_array[1]

      exec { "install elasticsearch plugin $p_name":
         command   => "$els_current/bin/plugin -install $p_name",
         logoutput => true,
         creates   => "$els_current/plugins/$p_dir",
      }
   }

   group { "$els_groupname" :
      ensure  => present,
      name    => $els_groupname,
      gid     => $els_gid,
   }
   
   user { "$els_username" :
      require => Group[$els_groupname],
      ensure  => present,
      name    => $els_username,
      uid     => $els_uid,
      gid     => $els_groupname,
      shell   => '/bin/bash',
      home    => $els_homedir,
      comment => 'ElasticSearch User',
   }

   file { "$els_homedir" :
      require => User[$els_username],
      ensure  => directory,
      mode    => 755,
   }

   file { "$els_content" :
      require => File[$els_homedir],
      ensure  => directory,
      mode    => 755,
   }
   
   file { "$els_piddir" :
      require => File[$els_content],
      ensure  => directory,
      mode    => 755,
   }
   
   file { "$els_downloads" :
      require => File[$els_homedir],
      ensure  => directory,
      mode    => 755,
   }
  
   file { "$els_downloads/$els_name.zip" :
      require => File["$els_downloads"],
      source  => "puppet:///files/elasticsearch/$els_name.zip",
   }
 
   exec { 'unzip elasticsearch' :
      require => File["$els_downloads/$els_name.zip"],
      command => "unzip $els_downloads/$els_name.zip",
      creates => "$els_base",
   }

   # Perms in zip file are too wide (777)
   exec { 'fix directories perms elasticsearch' :
      require => Exec['unzip elasticsearch'],
      command => "find $els_base -type d -exec chmod go-w {} \;",
      onlyif  => "perl -e 'exit(sprintf(\"%o\", (stat(\"$els_base\"))[2]&00077) ne \"77\")'",
   }

   file { "$els_current" :
      require => Exec['unzip elasticsearch'],
      ensure  => link,
      target  => "$els_base",
      notify  => $els_notify,
   }

   install_plugin { $els_plugins :
      require => File["$els_current"],
      notify  => $els_notify,
   }

   file { "$els_current/config/elasticsearch.yml" :
      require => File["$els_current"],
      content => template("elasticsearch/elasticsearch.yml"),
      notify  => $els_notify,
   }

   file { "$els_downloads/wrapper.zip" :
      require => File["$els_downloads"],
      source  => "puppet:///files/elasticsearch/wrapper.zip"
   }

   exec { 'install elasticsearch wrapper' :
      require => [ File["$els_current"], File["$els_downloads/wrapper.zip"] ],
      cwd     => "$els_current/bin",
      command => "unzip '$els_downloads/wrapper.zip'",
      creates => "$els_current/bin/service",
   }

   exec { 'source env in elasticsearch wrapper':
      require => [ Exec['install elasticsearch wrapper'], File["$els_env" ] ],
      command => "perl -pi.bak -e 'print \". $els_env # KILROY WAS HERE\n\" if $. == 2' $els_current/bin/service/elasticsearch",
      onlyif  => "grep -v '# KILROY WAS HERE$' $els_current/bin/service/elasticsearch",
      creates => "$els_current/bin/service/elasticsearch.bak",
   }

   exec { 'fix elasticsearch wrapper':
      require => Exec['source env in elasticsearch wrapper'],
      command => "perl -pi.fix -e 's|^PIDDIR=\".\"$|PIDDIR=$els_piddir|; s|^export ES_HOME=.*$|export ES_HOME=$els_current|' $els_current/bin/service/elasticsearch",
      creates => "$els_current/bin/service/elasticsearch.fix",
   }

   file { "$els_env" :
      require => [ File["$els_homedir"], User["$els_username"] ],
      content => template("elasticsearch/env"),
      notify  => $els_notify,
   }

   if ($operatingsystem == 'Darwin') {
      file { '/Library/LaunchDaemons/org.tanukisoftware.wrapper.elasticsearch.plist':
         require => Exec['fix elasticsearch wrapper'],
         content => template("elasticsearch/elasticsearch.plist"),
         owner   => root,
         group   => wheel,
         mode    => 0644,
      }

      service { 'org.tanukisoftware.wrapper.elasticsearch' :
         require => File['/Library/LaunchDaemons/org.tanukisoftware.wrapper.elasticsearch.plist'],
         enable  => true,
         ensure  => $els_ensure_running,
      }
   } else {
      file { '/etc/init.d/elasticsearch':
         require => Exec['fix elasticsearch wrapper'],
         ensure  => link,
         owner   => root,
         group   => root,
         target  => "$els_current/bin/service/elasticsearch",
      }

      service { 'elasticsearch' :
         require => File['/etc/init.d/elasticsearch'],
         name    => "elasticsearch",
         enable  => true,
         ensure  => $els_ensure_running,
      }
   }
}

templates/elasticsearch.yml

This is the configuration file used for all the nodes of the cluster, whereas they are master or data nodes. 

It is rather simple and probably far away from what is required in a production environment. 

Just a word about it: I am not using multicasting to discover master nodes (see the <% if !els_is_master %>  section below) because of my home network peculiarities and what I want to do with ElasticSearch at the present time. I guess that this is not a good practice, and this is another one good reason to adapt this file to your needs.

cluster.name: "<%= els_clustername %>"
<% if @els_nodename %>
node.name: "<%= els_nodename %>"
<% end %>
node.master: <%= els_is_master %>
node.data: <%= els_is_data %>
index.number_of_shards: <%= els_nb_of_shards %>
index.number_of_replicas: <%= els_nb_of_replicas %>
path.data: <%= els_datadir %>
path.work: <%= els_workdir %>
path.logs: <%= els_logdir %>
transport.tcp.port: <%= els_node2node_port %>
http.port: <%= els_http_port %>
<% if !els_is_master %>
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [<% els_masters.each do |host| -%> <%= host -%>, <%end -%> ]
<% end %>


templates/env

This file is sourced by the service wrapper (see how above in the init.pp file). The first three export lines are used by the wrapper. The two following lines may be used by ElasticSearch clients (such as scripts made up of curl requests or Java program) to access to the cluster.

The last line purpose is to give the indexing nodes enough resources to index a big dataset (in my case a corpus of 1.3 billion of words). I guess that it has to be transformed into a class parameter, and maybe under node type conditions (master node, data node only, ...).

export RUN_AS_USER=<%= els_username %>
export ES_HOME=<%= els_current %>
export ES_HEAP_SIZE=<%= els_heap_size %>
export ES_HTTP_PORT=<%= els_http_port %>
export ES_NODE2NODE_PORT=<%= els_node2node_port %>

# Number of files that can be opened simultaneously
ulimit -n 4096

templates/elasticsearch.plist

This file is specific to MacOS, it is used to install ElasticSearch as a service.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>org.tanukisoftware.wrapper.elasticsearch</string>
        <key>ProgramArguments</key>
        <array>
            <string><%= els_wrapper_script %></string>
            <string>launchdinternal</string>
        </array>
        <key>OnDemand</key>
        <true/>
        <key>RunAtLoad</key>
        <true/>
        <key>UserName</key>
        <string>elasticsearch</string>
    </dict>
</plist>


Resources

Internet resources put aside (among them this one), I appreciated particularly the content of these three books, each of them having its own interest.

Updates

  • 2013/03/31: update to the init.pp code related to a problem with ElasticSearch versions update. When the version parameter was changed (0.20.5 to 0.20.6), the service was not restarted because the new installed wrapper did not point to the actual pidfile (which were located in a sub-directory of the previous version of ElasticSearch). 
  • 2013/04/11: 
    • Forgot the elasticsearch.plist template file for MacOS
    • Tested this module on Suse SLES 11SP2 where Puppet is currently at version 2.6.17-0.3.1. I made two updates:
      • remove the comma at the end of the last parameter line of the class definition. It can be there in 2.7, but it generates a syntax error in 2.6.
      • change the require of file["$els_current/config/elasticsearch.yml"]: it must depend on File["$els_current"]. I did not see this bug in 2.7.