emacs.d/lp/shell-babel.org

#+TITLE: shell-babel
#+AUTHOR: Derek Feichtinger
#+EMAIL: derek.feichtinger@psi.ch
#+OPTIONS: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline
#+OPTIONS: author:t c:nil creator:comment d:(not LOGBOOK) date:t e:t
#+OPTIONS: email:nil f:t inline:t num:t p:nil pri:nil stat:t tags:t
#+OPTIONS: tasks:t tex:t timestamp:t toc:t todo:t |:t
#+DESCRIPTION:
#+EXCLUDE_TAGS: noexport
#+KEYWORDS:
#+LANGUAGE: en
#+SELECT_TAGS: export

# Original start of this document
# #+DATE: <2013-08-31 Sat>
# #+CREATOR: Emacs 24.3.1 (Org mode 8.0.7)


# By default I do not want that source code blocks are evaluated on export. Usually
# I want to evaluate them interactively and retain the original results.
#+PROPERTY: header-args :eval never-export

# Definition of a document wide variable to be used in src blocks
#+PROPERTY: header-args :var docwide_var="docwide"


* Version information
  #+BEGIN_SRC emacs-lisp -n :exports both :eval yes
        (princ (concat
                (format "Emacs version: %s\n"
                        (emacs-version))
                (format "org version: %s\n"
                        (org-version))))
  #+END_SRC

  #+RESULTS:
  : Emacs version: GNU Emacs 26.2 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
  :  of 2019-04-14
  : org version: 9.3.6

  #+BEGIN_SRC sh :results output :exports both :eval yes
  bash --version
  #+END_SRC

  #+RESULTS:
  : GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
  : Copyright (C) 2016 Free Software Foundation, Inc.
  : License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  :
  : This is free software; you are free to change and redistribute it.
  : There is NO WARRANTY, to the extent permitted by law.

  #+BEGIN_SRC sh :results output :exports both :eval yes
  dpkg -l dash | tail -n 1
  #+END_SRC

  #+RESULTS:
  : ii  dash           0.5.8-2.10   amd64        POSIX-compliant shell

* Variables for source blocks
  :PROPERTIES:
  :header-args+: :var section_var="section"
  :END:

  Note that you need to use a plus =+= in the definition of
  header-args in the document-wide or section properties if you want
  the new definition to be added to the old definition, i.e. use
  =header-args+=. If you use just =header-args= then the previous
  =:var= setting will be replaced by the new setting

  #+begin_src bash :results output :var block_var="block"
    echo $docwide_var
    echo $section_var
    echo $block_var
  #+end_src

  #+RESULTS:
  : docwide
  : section
  : block

* Tables as input for source blocks
** Using sh (or dash) as shell
   #+NAME: tbltest
   | col1 | col2 | col3 |
   |   11 |   12 |   13 |
   |   21 |   22 |   23 |
   |   31 |   32 |   33 |


   When using *sh* as language, I get the "old" behavior of obtaining all field values by just
   printing the variable.
   #+BEGIN_SRC sh :results value  :exports both :var tbl=tbltest :colnames yes
     echo $tbl
   #+END_SRC

   #+RESULTS:
   : 11 12 13 21 22 23 31 32 33

   Internally the variable expansion into the bash script is done by this org function:
   #+BEGIN_SRC elisp :results value :exports both
   (org-babel--variable-assignments:sh-generic 'tbl '((11 12 13) (21 22 23) (31 32 33)) nil nil)
   #+END_SRC

   #+RESULTS:
   : tbl='11	12	13
   : 21	22	23
   : 31	32	33'

*** TODO =org-babel-expand-src-block= expands sh blocks independent of shell type defined by src block

    Submitted as bug report to [[http://lists.gnu.org/archive/html/emacs-orgmode/2017-05/msg00082.html][mailing list]]

    When using =org-babel-expand-src-block= with a shell src block
    one always gets the same code expansion (in my case bash)
    independent of the shell that is used while the execution of the
    shell block uses the correct expansion.

    #+BEGIN_SRC sh :results value  :exports both :var tbl=tbltest :colnames yes
      echo $tbl
    #+END_SRC

    When expanding the sh source block above with =org-babel-expand-src-block= it is wrongly expanded to the
    bash expansion and not to the sh expansion that is used when the block is executed. So, instead of the
    sh expansion,

   : tbl='11	12	13
   : 21	22	23
   : 31	32	33'


    I see the following expansion in the opened buffer:

    #+BEGIN_EXAMPLE
       unset tbl
       declare -A tbl
       tbl['11']='12
       13'
       tbl['21']='22
       23'
       tbl['31']='32
       33'
       echo $tbl
    #+END_EXAMPLE

    Reason:

    The case distinction in =org-babel-variable-assignments:shell= is
    made based on the shell-file-name which is a standard emacs
    variable set by emacs in C code. This is pointing to "/bin/bash"
    for my installation.

    #+BEGIN_SRC elisp :exports source
       (defun org-babel-variable-assignments:shell (params)
        "Return list of shell statements assigning the block's variables."
        (let ((sep (cdr (assq :separator params)))
    	  (hline (when (string= "yes" (cdr (assq :hlines params)))
    		   (or (cdr (assq :hline-string params))
    		       "hline"))))
          (mapcar
           (lambda (pair)
    	 (if (string-suffix-p "bash" shell-file-name)
    	     (org-babel--variable-assignments:bash
                  (car pair) (cdr pair) sep hline)
               (org-babel--variable-assignments:sh-generic
    	    (car pair) (cdr pair) sep hline)))
           (org-babel--get-vars params))))
    #+END_SRC

    Looking at the calls stack for the case where we execute the source block and where we just expand it, we see
    the following call stack for execution

    #+BEGIN_EXAMPLE
        org-babel-variable-assignments:shell
        org-babel-execute:shell
        org-babel-execute:sh
        org-babel-execute-src-block
    #+END_EXAMPLE

    while in the case of just expanding the source block we have

    #+BEGIN_EXAMPLE
       org-babel-variable-assignments:sh
       org-babel-expand-src-block
    #+END_EXAMPLE

    Note that =org-babel-variable-assignments:sh= is an alias for
    =org-babel-variable-assignments:shell=.

    A bit of investigation shows that for all shell languages there
    are aliases defined that finally call
    =org-babel-execute:shell=. This is set up in the
    =org-babel-shell-initialize= function. And it is set up in a way
    that =shell-file-name= is overridden by the name of the
    particular shell, and this then leads to the correct case
    distinction using =shell-file-name= in
    =org-babel-variable-assignments:shell=.

    #+BEGIN_SRC elisp :exports source
       (defun org-babel-shell-initialize ()
        "Define execution functions associated to shell names.
       This function has to be called whenever `org-babel-shell-names'
       is modified outside the Customize interface."
        (interactive)
        (dolist (name org-babel-shell-names)
          (eval `(defun ,(intern (concat "org-babel-execute:" name))
    		 (body params)
    	       ,(format "Execute a block of %s commands with Babel." name)
    	       (let ((shell-file-name ,name))
    		 (org-babel-execute:shell body params))))
          (eval `(defalias ',(intern (concat "org-babel-variable-assignments:" name))
    	       'org-babel-variable-assignments:shell
    	       ,(format "Return list of %s statements assigning to the block's \
       variables."
    			name)))))
    #+END_SRC

    The same kind of overriding would have to be in place when
    =org-babel-expand-src-block= calls
    =org-babel-variable-assignments:shell= in the simple code expansion case. But that
    would be a bit hacky since the generic =org-babel-expand-src-block= function should
    not override variables needed in just one subclass of backends. It would be
    cleaner to have different functions =org-babel-variable-assignments:XXX= for the
    different shells.

** Using bash as shell
   When using *bash* as language, the expansion uses bash arrays. The
   current code (org 9.0.5) makes a case distinction between one-column
   tables and tables with multiple columns. This is implemented in
   =org-babel--variable-assignments:bash=.
*** tables with one column (vectors)
    A table with a single column is treated as a vector and translated to an *indexed bash
    array*.

    #+NAME: tblvector
    #+CAPTION: A vector table
    | a |
    | b |
    | c |
    | d |
    | e |

    #+BEGIN_SRC bash :results output  :exports both :var tbl=tblvector
      echo ${tbl[*]}
      echo ${tbl[0]}   ${tbl[2]}
    #+END_SRC

    #+RESULTS:
    : a b c d e
    : a c

    The internal expansion of such a vector table is done via
    =org-babel--variable-assignments:bash= and then
    =org-babel--variable-assignments:bash_array=

    #+BEGIN_SRC elisp :results value
      (org-babel--variable-assignments:bash 'tbl '((1) (2) (3) (4) (5)) nil nil)
    #+END_SRC

    #+RESULTS:
    : unset tbl
    : declare -a tbl=( '1' '2' '3' '4' '5' )

*** tables with multiple columns
    When using the multi column table from above, the expansion by org
    is done using an *associative bash array* where the first column becomes
    the index.

    #+BEGIN_SRC bash :results output  :exports both :var tbl=tbltest :colnames yes
      echo "trying a naive way of printing the table: " $tbl
      echo "using the bash syntax for printing all values: " ${tbl[*]}
      echo "and finally using a loop over the index"
      for idx in ${!tbl[*]}; do
          echo -n "    $idx  "
          while read line; do echo -n "$line  "; done  <<<${tbl[$idx]}
          echo
      done
    #+END_SRC

    #+RESULTS:
    : trying a naive way of printing the table:
    : using the bash syntax for printing all values:  22 23 12 13 32 33
    : and finally using a loop over the index
    :     21  22  23
    :     11  12  13
    :     31  32  33

    So, the first column ends up as the string indexes of the
    associative bash array. The current implementation has a major
    drawback: The *original order of the rows is not conserved* as
    can be seen above and in these examples. The elements belonging
    to different columns are separated by newlines (but the echo in
    the following code does not show it).


    #+BEGIN_SRC bash :results output  :exports both :var tbl=tbltest :colnames yes
      for idx in ${!tbl[*]}; do
	 echo $idx ${tbl[$idx]}
      done
    #+END_SRC

    #+RESULTS:
    : 21 22 23
    : 11 12 13
    : 31 32 33


    When using =:results value=, org uses the initial table's columns for the new table

    #+BEGIN_SRC bash :results value  :exports both :var tbl=tbltest :colnames yes
      for idx in ${!tbl[*]}; do
	 echo $idx ${tbl[$idx]}
      done
    #+END_SRC

    #+RESULTS:
    | col1 | col2 | col3 |
    |------+------+------|
    |   21 |   22 |   23 |
    |   11 |   12 |   13 |
    |   31 |   32 |   33 |


    One problem about the current implementation is that it is impossible to
    implement a generic solution allowing the use of the column names inside of
    the code when using =:colnames no=. Since the sequence of rows is not conserved,
    it is impossible to know which was the first row with the names
    #+BEGIN_SRC bash :results output  :exports both :var tbl=tbltest :colnames no
      for idx in ${!tbl[*]}; do
          echo -n "    $idx  "
          while read line; do echo -n "$line  "; done  <<<${tbl[$idx]}
          echo
      done
    #+END_SRC

    #+RESULTS:
    :     21  22  23
    :     11  12  13
    :     col1  col2  col3
    :     31  32  33

*** Working with descriptive column names

    #+NAME:tbltest2
    | name  | points | multi word comment |
    |-------+--------+--------------------|
    | Peter |     10 | bad luck           |
    | Paul  |     20 | middle ground      |
    | Mary  |     30 | the winner         |
    We can use use this nice little eval-based setup to work with
    descriptive column names. It takes just a minimal boilerplate.

    #+BEGIN_SRC bash :results output  :exports both :var tbl=tbltest2 :colnames yes
      colnames="name points comment"
      i=0; for cn in $colnames; do c[i]=$cn; i=$((i+1)); done
      for idx in ${!tbl[*]}; do
          eval "${c[0]}=$idx"
          i=1; while read line; do eval "${c[$i]}=\"$line\""; i=$((i+1)); done  <<<${tbl[$idx]}
          echo "name:$name points:$points comment:$comment"
      done
    #+END_SRC

    #+RESULTS:
    : name:Mary points:30 comment:the winner
    : name:Paul points:20 comment:middle ground
    : name:Peter points:10 comment:bad luck

*** slices
  One can use a slice indexing for only importing a subrange of a table
  #+BEGIN_SRC sh :results value :exports both :var slice=src-table2[3:10,0:1] :colnames yes
  echo $slice
  #+END_SRC

  #+RESULTS:
  : 11 55 10 50 15 75 14 70 5 25 6 30 7 35
*** implementation details of bash table variable assignment
    Let's have a look at how the expansion is implemented. The array
    is set through =org-babel--variable-assignments:bash= and then
    =org-babel--variable-assignments:bash_assoc=.

    #+BEGIN_SRC elisp :results value :exports both
      (org-babel--variable-assignments:bash 'tbl '((11 12 13) (21 22 23) (31 32 33)) nil nil)
    #+END_SRC

    #+RESULTS:
    : unset tbl
    : declare -A tbl
    : tbl['11']='12
    : 13'
    : tbl['21']='22
    : 23'
    : tbl['31']='32
    : 33'


    I think it would be nicer to treat the first column identical to
    the other columns and not make it the index of an associative
    array, even though this may be appealing for problems involving
    just two columns where the current implementation allows fast
    key-value lookups.

    A nicer implementation to me would be the use of a simple indexed array
    where all values of a row are put into the value part of an array field,
    the index number just reflecting the row number.
    This allows me to print all fields with an easy
    command (=${tbl[*]}=) similar to the older implementations. While this gives me all fields on a
    single output line (losing the table structure), I can also retrieve
    the whole table structure with the rows in the original order by using a loop construct.

    #+BEGIN_SRC bash :results value  :exports both
      unset tbl
      declare -a tbl
      tbl[0]='11 12 13'
      tbl[1]='21 22 23'
      tbl[2]='31 32 33'

      for idx in ${!tbl[*]}; do
	 echo ${tbl[$idx]}
      done
    #+END_SRC

    #+RESULTS:
    | 11 | 12 | 13 |
    | 21 | 22 | 23 |
    | 31 | 32 | 33 |

** more examples
   We first create a table from a lisp list of lists. Since my final result table
   should contain three columns, I already insert a header row with the names for
   the three columns.
   #+BEGIN_SRC emacs-lisp :results value :exports both
     (cons '(col1 col2 col3)
           (loop for i from 5 to 15 collect `(,i ,(* i 5))))
   #+END_SRC

   #+NAME: table1
   #+RESULTS:
   | col1 | col2 | col3 |
   |    5 |   25 |      |
   |    6 |   30 |      |
   |    7 |   35 |      |
   |    8 |   40 |      |
   |    9 |   45 |      |
   |   10 |   50 |      |
   |   11 |   55 |      |
   |   12 |   60 |      |
   |   13 |   65 |      |
   |   14 |   70 |      |
   |   15 |   75 |      |

   sidenote: the -n flag results in line numbers for the exported source code.

   #+NAME: src-table2
   #+BEGIN_SRC bash -n :results value :exports both :var tbl=table1 :colnames yes
     for idx in ${!tbl[*]}; do
         echo $idx ${tbl[$idx]} $((${tbl[$idx]}*2))
     done
   #+END_SRC

   #+RESULTS: src-table2
   | col1 | col2 | col3 |
   |------+------+------|
   |   13 |   65 |  130 |
   |   12 |   60 |  120 |
   |   11 |   55 |  110 |
   |   10 |   50 |  100 |
   |   15 |   75 |  150 |
   |   14 |   70 |  140 |
   |    5 |   25 |   50 |
   |    6 |   30 |   60 |
   |    7 |   35 |   70 |
   |    8 |   40 |   80 |
   |    9 |   45 |   90 |


   As remarked before, the order of the rows is regrettably lost with the current implementation of bash arrays. In the
   present case once could use a sort filter at the end, but this only works because we use some external knowledge
   about this particular table. For generic tables the order is lost.

* some useful source block options
** dir
   One can use the :dir option to have the shell code executed within
   a particular working directory.

   #+BEGIN_SRC sh :results value :dir /home :exports both
   pwd
   #+END_SRC

   #+RESULTS:
   : /home

   Since the directory can also be a TRAMP URL, =:dir= allows easy
   *execution of commands on remote servers*, which to me is the most
   powerful application of this option. Combine this option with
   the SSH configuration options *ControlMaster and ProxyCommand*
   and all remote hosts become one hop away, and you only need to
   authenticate once. This allows very nice documenting of remote
   work and writing template documents collecting information from
   remote servers.


   #+BEGIN_SRC sh :results output drawer :dir /ssh:root@dftest2.psi.ch:/etc :exports both
   hostname
   pwd
   #+END_SRC

   #+RESULTS:
   :RESULTS:
   dftest2
   /etc
   :END:

** line numbering for exported code: -n

   Using the flag =-n= results in the exported code lines being printed with line numbers.

   #+BEGIN_SRC bash -n :results value  :exports source :var tbl=tbltest :colnames yes
     unset tbl
     declare -a tbl
     tbl[0]='11 12 13'
     tbl[1]='21 22 23'
     tbl[2]='31 32 33'

     for idx in ${!tbl[*]}; do
        echo ${tbl[$idx]}
     done
   #+END_SRC

* noweb example - including code blocks in other code blocks
  Redefine the standard *noweb markers*, since =<<= and =>>= are valid shell code redirectors and this messes
  up the syntax highlighting for source blocks. This can be
  done by defining the variables =org-babel-noweb-wrap-start= and =org-babel-noweb-wrap-end=. I do this
  in the footer of this document in the emacs "Local Variables" section choosing a markup as in "=<<<bla>>>=".

  #+NAME: srcCodeA
  #+BEGIN_SRC bash
    echo "I am from A"
  #+END_SRC

  Now we include the code from the upper source block in the following block

  #+BEGIN_SRC bash :results output :exports both :noweb yes
    echo "This is B"

    <<<srcCodeA>>>

    echo "This is B again"

    cat <<EOF
    this way we do not mess with "here"-documents
    EOF

    echo "the end"
  #+END_SRC

  #+RESULTS:
  : This is B
  : I am from A
  : This is B again
  : this way we do not mess with "here"-documents
  : the end

* Changes in regard to earlier versions of this document
** org-babel-sh-command no longer used for selecting shell
   In earlier implementations of org one needed to select the
   particular shell that was run by setting the =org-babel-sh-command=
   to the shell executable, e.g. "/bin/bash". This was either done
   globally or in the usual local variable section of a document. The
   newer org versions (certainly org>9.x) allow specifying the shell
   type as one usually specifies any language of a source block,
   i.e. by writing a header like =#+BEGIN_SRC bash=.

* COMMENT babel settings

  Local Variables:
  org-babel-noweb-wrap-start: "<<<"
  org-babel-noweb-wrap-end: ">>>"
  org-confirm-babel-evaluate: nil
  End: