Long exercise: Data handling and presentation
=============================================

.. highlight:: console

.. role:: bash(code)
   :language: bash
   :class: highlight

This page is intended as an exercise in data handling, processing
and presentation. Of course, as with everything else in this course, we'll only 
be touching the surface! Before we begin, you should always keep in mind that
the best tool for data generation or handling may not be the best for presentation: 
a good example of this is Mathematica, which is wonderful for maths work, but leaves 
something to be desired when it comes to producing pretty plots. Be open minded!

We also won't be dealing at all with the aesthetics of efficient data presentation:
that's left to your taste, intuition and experience (and to Edward Tufte's superb
book, *The Visual Display of Quantitative Information*).


The task is to combine several software packages together to produce parameter
scans for sparticle masses in BSM models. We will use Softsusy
to produce mass spectra, and visualize these with pySLHA and
self-written contour plots. 

Before we begin, le't have a quick look at Makefiles.

Makefiles and :command:`make`
-----------------------------

Recompiling large code projects after each change can take a long time, but
often it is not necessary to rebuild all of it. 
Changes in :file:`.cc` files, for example, only
require recompilation of the corresponding :file:`.o` and re-linking.
Changes in header files affect all code where they are included.

Makefiles provide a way of tracking these dependencies. Each entry
consists of a target, followed by the files it depends on. On the
following lines, you can optionally write *rules* (which must be 
preceded by :kbd:`Tab` characters, not spaces! [#fn1]_), specifying how to make the target from the dependencies. A rule can be any shell command, and you can have several rules attached to a target, which will be executed in sequence. 


Prepare two text files :file:`bar` and :file:`baz`
and write this example :file:`Makefile`::

  foo : bar baz
  [--tab--]cat bar baz > foo

Running the command :command:`make` in the directory with the Makefile now
will run :bash:`cat bar baz > foo` in turn, producing a :file:`foo`
file. 

Re-running :command:`make` again will do nothing, since none of the
up-stream dependents of :file:`foo` have changed. Edit :file:`bar` and
re-run :command:`make`!

:command:`make` has several built-in rules for common situations. It
already knows how to build object files from cc files, so it's not
necessary to specify a rule here::

  foo.o : foo.cc foo.h bar.h
  bar.o : bar.cc bar.h


:command:`make` is most often used for software compilation, but it can be
very helpful in any situation where a sequence of interdependent 
actions has to be performed in a certain order whenever a file changes.

.. rubric:: Footnotes

.. [#fn1] One of the top-ten list of bad design choices in software.


Let's move on to the actual project.

Installation of a source code package
-------------------------------------

Softsusy's source code is available from
http://softsusy.hepforge.org/. Like most Fortran-, C- or C++-codes, it 
is distributed as a tarball with a standardized
installation procedure using the *GNU autotools*. 
After unpacking the tarball and entering the :file:`softsusy-nn.nn.nn`
directory, the first step is to run::

  $ ./configure --prefix=$HOME/local F77=gfortran

The :option:`--prefix` argument determines where the compiled code will be
installed. The default value here is :file:`/usr/local/`, which is
usually not writeable by regular users, so we need to supply our own
location. The :envvar:`F77` variable can be used to override the default Fortran compiler choice. Similar variables are :envvar:`CC` for the C
compiler, as well as :envvar:`CXX` for the choice of C++ compiler. The main
output of :command:`configure` is a set of :dfn:`Makefiles`, which build the
package with all the specified settings. 

The automatically produced Makefiles from :command:`configure` are not
intended to be human-readable, but they work in exactly the same way 
as our simple example above.

Practically all ready-made HEP packages come as tarballs with
included Makefiles. As we've seen, some require an extra configuration step to
prepare the Makefiles, e.g. to provide an installation location or
other setup information. The usual sequence of commands is::

  $ tar xzvf somepackage-1.2.3.tar.gz
  $ cd somepackage-1.2.3
  $ ./configure --prefix=$HOME/local/
  $ make
  $ make install


Running Softsusy
----------------

Let's get back to our exercise. After installing Softsusy, we're ready
for the first test run. Softsusy takes an input parameter card with
supersymmetry-breaking parameters in the *SUSY Les Houches Accord* format, 
and produces the MSSM SUSY mass spectrum as output.

An example input card for the CMSSM point *sps1a* is available at 
http://softsusy.hepforge.org/code/lesHouchesInput. Save
it as :file:`lesHouchesInput.slha`::

  # Example input in Les Houches format, and suitable for input to
  # SOFTSUSY (v1.8 or higher): SPS1a input
  Block MODSEL                 # Select model
  1    1                   # sugra
  Block SMINPUTS               # Standard Model inputs
  1   1.279340000e+02      # alpha^(-1) SM MSbar(MZ)
  2   1.166370000e-05      # G_Fermi
  3   1.172000000e-01      # alpha_s(MZ) SM MSbar
  4   9.118760000e+01      # MZ(pole)
  5   4.250000000e+00      # mb(mb) SM MSbar
  6   1.743000000e+02      # mtop(pole)
  7   1.777000000e+00      # mtau(pole)
  Block MINPAR                 # Input parameters
  1   1.000000000e+02      # m0
  2   2.500000000e+02      # m12
  3   1.000000000e+01      # tanb
  4   1.000000000e+00      # sign(mu)
  5  -1.000000000e+02      # A0
  Block SOFTSUSY               # Optional SOFTSUSY-specific parameters
  1   1.000000000e-03      # Numerical precision: suggested range 10^(-3...-6)
  2   0.000000000e+00      # Quark mixing parameter: see manual
  5   1.000000000e+00      # Include 2-loop scalar mass squared/trilinear RGEs

The interesting block is :kbd:`MINPAR`, the five free high-scale
parameters of the CMSSM. To get the output spectrum, run::

  $ export PATH=$PATH:$HOME/local/bin   # (why?)
  $ softpoint.x leshouches < lesHouchesInput.slha > sps1a.slha

For us, the interesting part of the output are the sparticle masses in
:kbd:`BLOCK MASS`::

  #PDG code      mass              particle
        24     8.03907025e+01   # MW
        25     1.26926792e+02   # h0
        35     1.67333827e+03   # H0
        36     1.67326915e+03   # A0
        37     1.67527975e+03   # H+
   1000001     1.14434683e+03   # ~d_L
   1000002     1.10479594e+03   # ~u_L
  [...]

Try to extract only the lines with the masses from the slha file,
using some of the command line tools we used previously. A summary can
be found in the section :ref:`simple-data-handling`.

Installation of a Python package
--------------------------------

We can diplay SUSY masses conveniently with pySLHA, from
http://www.insectnation.org/projects/pyslha. Like many python
modules, it can be installed very easily from the Python package
index http://pypi.python.org/, with the :command:`easy_install` tool::

  $ easy_install --prefix=$HOME/local pyslha

Again, we need :option:`--prefix` to choose an installation location. Most
likely, :command:`easy_install` will fail the first tome you try it, with a long error message about an unknown directory ending in
:file:`python2.x/site-packages`. This is the local directory where
Python will look for extra packages that you have installed. Since
this is your first additional package, you'll have to create the
directory first, and tell Python to look there::

  $ mkdir -p \$HOME/local/lib/python2.6/site-packages
  $ export PYTHONPATH=\$HOME/local/lib/python2.6/site-packages

Rerun the :command:`easy_install` command, which should go through now. In
:file:`$HOME/local/bin`, you will find a program called :command:`slhaplot`.

pySLHA test
-----------

We're now ready to plot the *sps1a* SUSY spectrum. 

..
  To work around
  a bug in \cmd{vega}'s \LaTeX installation, export this additional
  \LaTeX setup directory first:\\
  \inp{export TEXMFHOME=/home/hudson/staff/dph1dg1/texmf}\\

The command is now simply::

  $ slhaplot sps1a.slha

Have a look at the PDF file it produces.


Repeated runs
-------------

Now, let's go to the first part of the main task. Running one SLHA
point is easy enough, but what about 500 points? Try to create a set of scripts
in Bash (or Python), and maybe Makefiles that would make the task easy
for you. Given a directory full of input SLHA files, run them all
through Softsusy and keep track of the outputs in a systematic way.

Mass contour plots
------------------

You already can run :command:`Softsusy` for many different parameter points.
Now make a scan over two of the parameters (:math:`m_0` and :math:`m_{1/2}`), and from the results generate a contour plot for different sparticle masses,
like the gluino, neutralino, stau or stop. Then generate a single pdf
document with these contour plots.

Ideally, keep your scripts flexible enough to make it easy to plot other
masses, scan over different parameters, etc.

There are many different options to create contour plots. You could
certainly use :command:`gnuplot`, which we already introduced earlier. NCL (NCAR Command Language) is an example for an interpreted language designed
specifically for scientific data processing and visualisation. For
today, we suggest that you use Python's *matplotlib*.


Also for merging the final plots into a single pdf document you can
choose from a variety of options. :command:`pdftk` is a very convenient tool for manipulating pdf documents [#fn2]_, and is
installed on most modern Linux systems. In principle you could
use :command:`convert` from the ImageMagick package, but it will probably convert your vector graphics into bitmaps and therefore inflate the file size by orders of magnitude (see the section on :ref:`image-manipulation`). Finally you could generate a :math:`\LaTeX`
document and insert the plots using :kbd:`\includegraphics`.


.. rubric:: Footnotes

.. [#fn2] From their website: "If PDF is electronic paper, then pdftk is an
          electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses"


Summary of useful tools
=======================

.. _simple-data-handling:

Simple data handling
--------------------


Using shell tools to extract and
store data as plain text can be a useful method, especially if the
numbers you're looking for only exist in the arcane output of some pheno code
you've downloaded! For the record, the tools of most use are 
:command:`grep`, :command:`sed`, :command:`awk`, :command:`cut`, :command:`paste`, :command:`sort` and :command:`column`, as well as the other standard shell idioms and tools like *regular expressions*. Using these, you can extract numbers from almost arbitrarily structured text and place them in some nice, simple format like columned plain text.


Python matplotlib
-----------------


.. highlight:: python


A simple interface to producing plots is the Python Matplotlib package. Matplotlib is made all the more powerful by its integration with the NumPy and SciPy packages:, a powerful set of numerical/scientific data handling objects written (for speed) in Fortran/C/C++ and accessed through Python. NumPy is a *very* efficient numerical processing framework.

The best place to learn your way around NumPy and Matplotlib (most easily accessed in Python as a module called :kbd:`pylab`) is by going to the corresponding websites (see the References) and following some of 
the tutorials and "cookbook" entries. However, here's a couple of examples that should give you some idea of the ease of using these modules and the quality of their output.

Specifically, we're going to generate a contour plot of a 2D 
skewed Gaussian. We'll start by importing the modules and making a 2D grid of points from :math:`-1` to 
:math:`+1` with point spacings of :math:`0.01` in each direction::

  from scipy import ogrid, stats
  import numpy as np
  import pylab as pl
  x,y = ogrid[-1:1:0.01, -1:1:0.01]


Now we'll fetch a Gaussian distribution and sample from it in 2 dimensions (note how the functions
implicitly handle vectors without needing to iterate over the elements), then use the matplotlib :kbd:`imshow` function to make a pretty colour map of the 2D Gaussian::

  gauss = stats.distributions.norm(loc=0, scale=0.5)
  z = gauss.pdf(np.sqrt(x**2 + 4*y**2))
  pl.imshow(z, origin="lower", extent=[-1,1,-1,1])
  pl.show()


Now we'll rotate the axes (there is certainly a neater way than this --- I've just implemented the rotation matrix manually! --- but this is simple)::

  theta = (np.pi / 180) * 30
  xprime = np.cos(theta) * x - np.sin(theta) * y
  yprime = np.sin(theta) * x + np.cos(theta) * y
  zprime = gauss.pdf(np.sqrt(xprime**2 + 4*yprime**2))
  pl.imshow(zprime, origin="lower", extent=[-1,1,-1,1])
  pl.hold(True)
  pl.contour(zprime, origin="lower", extent=[-1,1,-1,1])

where the :samp:`pl.hold(True)` was used to make sure that the colour plot didn't get erased when we drew the contours.

Now we'll save in various formats::

  pl.savefig("rotgauss") # saves as PNG by default
  pl.savefig("rotgauss.png") # as above
  pl.savefig("rotgauss.eps") # saving as EPS
  pl.savefig("rotgauss.svg") # saving as Scalable Vector Graphics


If making a matplotlib
plot for publication, the most reliable colour scheme is still the greyscale one. If you go for colour, keep in mind that only two-tone palettes have a reliable psychological ordering. You can get change palette with functions like:: 

  pl.grey()


An example output is shown here.

.. image:: rotgauss.png


.. 
  \begin{figure}[ht]
  \begin{center}
  \includegraphics[width=0.5\textwidth]{rotgauss}
  \end{center}
  \caption{Rotated 2D Gaussian as colour plot with contours generated by SciPy with \kbd{mpl}}
  \label{fig:SkewedGaussian}
  \end{figure}


.. _image-manipulation:

A bit of image manipulation
---------------------------

We haven't got time to do a proper lecture on image formats and the like, which are always a bit of a mystery to first time users. Roughly speaking, there are two distinct classes of image file: :dfn:`bitmap` files which describe the colour of individual pixels in the image and :dfn:`vector` formats which store parametric descriptions of drawing objects. You would probably guess that making
bitmaps from vector formats is easy and the reverse is rather harder. For what it's worth, the :program:`Inkscape` program includes a very
impressive bitmap-to-vector input filter.

The big operational difference between vectors and bitmaps is that vector
formats can be arbitrarily scaled without losing smoothness, whereas bitmaps are intrinsically pixelated. For graphs and diagrams in papers and theses, vector graphics, usually in encapsulated PostScript or PDF format, are the best choice. Only use bitmaps (like JPEG, PNG, GIF) if you have no option, such as with a photograph or a bitmap image that you got from e.g. the Web. For data graphics, you almost certainly want to use PNG rather than JPEG: the latter is designed for photos and will produce nasty compression artefacts such as blotchy areas and fuzziness around lines on plots with areas of uniform block colour.


.. highlight:: console


Let's do a bit of format conversion on the command line:

- To convert a "normal" PostScript file, which assumes that it occupies a full   page to an :dfn:`encapsulated PostScript` (or EPS) file to use it in, say, a :math:`\LaTeX` document: use the :command:`ps2epsi` program and the line::

    for i in *.epsi; do mv $i ${i%i}; done

  to an EPS file with the standard :file:`.eps` file extension.

- From EPS, if you're writing a document (such as :math:`\LaTeX` Beamer 
  slides for a conference talk) using :command:`pdflatex`, you'll want to have your vector graphics in PDF format. You can convert EPS files to PDF using the :command:`epstopdf` tool.

- A useful set of bitmap manipulation utilities is the *ImageMagick* set of
  command line programs. They will also do some limited processing of vector formats like EPS and SVG. The programs include :command:`identify` for getting information about an image file, :command:`convert` for changing between bitmap formats and :command:`mogrify` for generally messing with an image (cropping, resizing, blurring, sharpening etc.). They can be handy for many things, including sampling (E)PS and PDF files into a bitmap format for e.g. putting on the Web.

Other data analysis packages
----------------------------

There are of course many other data analysis packages. Here are a few choice ones you might want to look at:


R (http://www.r-project.org/)
  is a very well established statistics and data analysis package
  used, for example, in a lot of biometrics research (it's a Free implementation of a commercial stats language called S). It has a clever list/tuple-based interaction language and masses of statistical functions. I don't think the plot output is of publication quality, though, but as ever there's no harm in exporting to plain text and using a better presentation package. Even better, use the :kbd:`pyr` Python module and access all of R's functionality from within SciPy!

Octave (https://www.gnu.org/software/octave/)
  Matlab is widely used among engineers and other
  researchers but never seems to have taken off in HEP. It provides a lot of
  powerful array processing functionality and good plotting. A Free
  implementation, called Octave, is available but uses gnuplot for its plotting, which hence looks a bit rubbish. Matplotlib implements much of its plotting functionality, so maybe Octave+SciPy/matplotlib is a good combination.

Sage (http://www.sagemath.org/)
  A fairly new python-based integration of lots of scientific tools. To quote their website:
  
    Sage is a free open-source mathematics software system [...] It combines the power of many existing open-source packages into a common Python-based interface.
    Mission: *Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab*.


Drawings
--------

We haven't the time to talk about graphics other than plotting, though we managed to say a bit about vector and bitmap graphics formats earlier. Here are a few suggestions for how to produce other graphics for your talks, papers and theses:

Postscript
  PS is actually a human readable, Turing-complete language in
  its own right. It is stack-based, which is probably a rather unfamiliar
  style of programming to most people, but it's strangely satisfying. You can
  do surprisingly complicated things in it by hand, or generate *very*
  complex things programatically if you're ingenious
  enough (the program will execute on the printer's CPU!). Bill Casselman's
  `Mathematical Illustrations <http://www.math.ubc.ca/~cass/graphics/manual/>`_ book and the various PostScript manuals (all available online) are  excellent references for this.

Inkscape 
  a relatively recent development in the Linux world, and a truly excellent
  interactive vector drawing package with a very intuitive interface and bags of functionality. Marvellous. 

.. 
  JaxoDraw and Feyndiagram
  for producing Feynman diagrams, there are
  several possible approaches. You can take the graphics direct from FeynArts,
  write them directly in your \TeX{} source using the \texpkg{feynmf} or
  \texpkg{axodraw} \LaTeX{} packages\dots or you can use one of these programs.
  Jaxodraw is a nice Java GUI on top of \texpkg{axodraw}, and Feyndiagram
  (\url{http://www.feyndiagram.com/}) is a \CC class-based way to write your diagrams.

Visual Python
  A great interactive interface for 3D data,
  including zoom and rotation functionality.

Python OpenGL/Cairo
  OpenGL is a 3D graphics standard and there are now
  some nice Python interfaces to it. Cairo is becoming a Free Software standard
  for lower-level 2D graphics and *pycairo* provides the interface to it.

Graphviz
  this is a language for the graphical representation of 
  node-edge networks.
  Or, more simply, if you tell it about various nodes connected
  by edges, it'll produce an optimally (for
  some definition of optimal!) laid-out node-and-line diagram. Very cool.

xfig/Jfig
  xfig is an old-but-good vector drawing package, but is largely
  superseded by the likes of Inkscape these days. It has a rather odd interface,  which is actually a lot cleverer than it looks and can be used to produce very  nice output.  Jfig is a recent implementation in Java, which can also be used  as a Java graphics library.

Gimp
  the GNU Image Manipulation Program is an excellent Free
  "PhotoShop-alike". It only manipulates 2D bitmap graphics, but is
  very good at it.

ImageMagick 
  we've already mentioned this swiss-army-knife of
  bitmap manipulation.

pdftk
  Another swiss-army-knife: anything you'd ever want to do
  to PDF documents. Arbitrarily reshuffle pages, select from different
  sources to produce one output, rotate, crop, ...


Have fun experimenting!