Merge pull request #6 from opencultureconsulting/dependabot/pip/binder/jupyter-server-proxy-3.2.1

Bump jupyter-server-proxy from 1.5.3 to 3.2.1 in /binder
2022-01-28 17:23:34 +01:00 · 2022-01-27 16:25:09 +00:00 · 2021-11-09 23:14:30 +01:00 · 2021-06-17 13:00:33 +02:00 · 2021-06-17 12:59:47 +02:00 · 2021-06-17 12:42:22 +02:00
7 changed files with 166 additions and 116 deletions
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 ## OpenRefine batch processing (openrefine-batch.sh)

-[![Codacy Badge](https://api.codacy.com/project/badge/Grade/66bf001c38194f5bb722f65f5e15f0ec)](https://www.codacy.com/app/mail_74/openrefine-batch?utm_source=github.com&utm_medium=referral&utm_content=opencultureconsulting/openrefine-batch&utm_campaign=badger)
+[![Codacy Badge](https://app.codacy.com/project/badge/Grade/ad8a97e42e634bbe87203ea48efb436e)](https://www.codacy.com/gh/opencultureconsulting/openrefine-batch/dashboard) [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/opencultureconsulting/openrefine-batch/master?urlpath=lab/tree/demo.ipynb)

 Shell script to run OpenRefine in batch mode (import, transform, export). This bash script automatically...

@ -17,9 +17,21 @@ If you prefer a containerized approach, see a [variation of this script for Dock
 - **Step 1**: Do some experiments with your data (or parts of it) in the graphical user interface of OpenRefine. If you are fine with all transformation rules, [extract the json code](http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html) and save it as file (e.g. transform.json).
 - **Step 2**: Put your data and the json file(s) in two different directories and execute the script. The script will automatically import all data files in OpenRefine projects, apply the transformation rules in the json files to each project and export all projects to files in the format specified (default: TSV - tab-separated values).

+### Demo via binder
+
+[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/opencultureconsulting/openrefine-batch/master?urlpath=lab/tree/demo.ipynb)
+
+- free to use on-demand server with Jupyterlab and Bash Kernel
+- no registration needed, will start within a few minutes
+- [restricted](https://mybinder.readthedocs.io/en/latest/about/about.html#how-much-memory-am-i-given-when-using-binder) to 2 GB RAM and server will be deleted after 10 minutes of inactivity
+
 ### Install

-Download the script and grant file permissions to execute: `wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch.sh && chmod +x openrefine-batch.sh`
+Download the script and grant file permissions to execute:
+```
+wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch.sh
+chmod +x openrefine-batch.sh
+```

 That's all. The script will automatically download copies of OpenRefine and the python client on first run and will tell you if something (python, java) is missing.

@ -75,7 +87,7 @@ execute openrefine-batch.sh
 ### Help Screen

 ```
-[23:10 felix ~/openrefine-batch]$ ./openrefine-batch.sh
+[felix@tux openrefine-batch]$ ./openrefine-batch.sh
 Usage: ./openrefine-batch.sh [-a INPUTDIR] [-b TRANSFORMDIR] [-c OUTPUTDIR] ...

 == basic arguments ==
@ -154,12 +166,12 @@ The script prints log messages from OpenRefine server and makes use of `ps` to s
 ```
 [felix@tux openrefine-batch]$ ./openrefine-batch.sh -a examples/powerhouse-museum/input/ -b examples/powerhouse-museum/config/ -c examples/powerhouse-museum/output/ -f tsv -i processQuotes=false -i guessCellValueTypes=true -RX
 Download OpenRefine...
-openrefine-linux-3.2.tar.g 100%[=====================================>] 101,13M  4,13MB/s    in 27s     
+openrefine-linux-3.5.0.tar.gz                               100%[=========================================================================================================================================>] 125,73M  9,50MB/s    in 13s     
 Install OpenRefine in subdirectory openrefine...
-Total bytes read: 125419520 (120MiB, 145MiB/s)
+Total bytes read: 154163200 (148MiB, 87MiB/s)

 Download OpenRefine client...
-openrefine-client_0-3-4_li 100%[=====================================>]   4,69M  2,78MB/s    in 1,7s    
+openrefine-client_0-3-10_linux                              100%[=========================================================================================================================================>]   4,25M  9,17MB/s    in 0,5s    

 Input directory:         /home/felix/git/openrefine-batch/examples/powerhouse-museum/input
 Input files:             phm-collection.tsv
@ -180,106 +192,99 @@ restart after transform: false

 === 1. Launch OpenRefine ===

-starting time: Mo 29. Jul 23:33:34 CEST 2019
+starting time: Di 9. Nov 22:37:25 CET 2021

-You have 15962M of free memory.
+Using refine.ini for configuration
+You have 15913M of free memory.
 Your current configuration is set to use 2048M of memory.
 OpenRefine can run better when given more memory. Read our FAQ on how to allocate more memory here:
-https://github.com/OpenRefine/OpenRefine/wiki/FAQ:-Allocate-More-Memory
+https://github.com/OpenRefine/OpenRefine/wiki/FAQ-Allocate-More-Memory
+/usr/bin/java -cp server/classes:server/target/lib/* -Drefine.headless=true -Xms2048M -Xmx2048M -Drefine.memory=2048M -Drefine.max_form_content_size=1048576 -Drefine.verbosity=info -Dpython.path=main/webapp/WEB-INF/lib/jython -Dpython.cachedir=/home/felix/.local/share/google/refine/cachedir -Drefine.data_dir=/home/felix/git/openrefine-batch/examples/powerhouse-museum/output -Drefine.webapp=main/webapp -Drefine.port=3333 -Drefine.interface=127.0.0.1 -Drefine.host=127.0.0.1 -Drefine.autosave=1440 com.google.refine.Refine
 Starting OpenRefine at 'http://127.0.0.1:3333/'

-23:33:34.277 [            refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)
-23:33:34.277 [            refine_server] refine.memory size: 2048M JVM Max heap: 2058354688 (0ms)
-23:33:34.284 [            refine_server] Initializing context: '/' from '/home/felix/git/openrefine-batch/openrefine/webapp' (7ms)
+log4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log).
+log4j:WARN Please initialize the log4j system properly.
+log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
 SLF4J: Class path contains multiple SLF4J bindings.
-SLF4J: Found binding in [jar:file:/home/felix/git/openrefine-batch/openrefine/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
-SLF4J: Found binding in [jar:file:/home/felix/git/openrefine-batch/openrefine/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
+SLF4J: Found binding in [jar:file:/home/felix/git/openrefine-batch/openrefine/webapp/WEB-INF/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
+SLF4J: Found binding in [jar:file:/home/felix/git/openrefine-batch/openrefine/server/target/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
-23:33:34.706 [                   refine] Starting OpenRefine 3.2 [55c921b]... (422ms)
-23:33:34.706 [                   refine] initializing FileProjectManager with dir (0ms)
-23:33:34.706 [                   refine] /home/felix/git/openrefine-batch/examples/powerhouse-museum/output (0ms)
-23:33:34.709 [       FileProjectManager] Failed to load workspace from any attempted alternatives. (3ms)
-23:33:38.275 [                   refine] Running in headless mode (3566ms)
+22:37:28.211 [                   refine] Starting OpenRefine 3.5.0 [d4209a2]... (0ms)
+22:37:28.213 [                   refine] initializing FileProjectManager with dir (2ms)
+22:37:28.213 [                   refine] /home/felix/git/openrefine-batch/examples/powerhouse-museum/output (0ms)
+22:37:28.223 [       FileProjectManager] Failed to load workspace from any attempted alternatives. (10ms)

 === 2. Import all files ===

-starting time: Mo 29. Jul 23:33:39 CEST 2019
+starting time: Di 9. Nov 22:37:33 CET 2021

 import phm-collection.tsv...
-23:33:39.466 [                   refine] POST /command/core/create-project-from-upload (1191ms)
-23:33:44.326 [                   refine] GET /command/core/get-models (4860ms)
-23:33:44.409 [                   refine] POST /command/core/get-rows (83ms)
-id: 1675004209805
+22:37:33.804 [                   refine] GET /command/core/get-csrf-token (5581ms)
+22:37:33.872 [                   refine] POST /command/core/create-project-from-upload (68ms)
+22:37:44.653 [                   refine] GET /command/core/get-models (10781ms)
+22:37:44.790 [                   refine] POST /command/core/get-rows (137ms)
+id: 2252508879578
 rows: 75814
-23:33:44.495 [                   refine] GET /command/core/get-models (86ms)
 STARTED     ELAPSED %MEM %CPU   RSS
-23:33:33       00:10  5.9  207 976248
+22:37:25       00:19 10.2  202 1670620

 === 3. Prepare transform & export ===

-starting time: Mo 29. Jul 23:33:44 CEST 2019
+starting time: Di 9. Nov 22:37:44 CET 2021

 get project ids...
-23:33:44.597 [                   refine] GET /command/core/get-all-project-metadata (102ms)
- 1675004209805: phm-collection
+22:37:45.112 [                   refine] GET /command/core/get-csrf-token (322ms)
+22:37:45.115 [                   refine] GET /command/core/get-all-project-metadata (3ms)
+ 2252508879578: phm-collection

 === 4. Transform phm-collection ===

-starting time: Mo 29. Jul 23:33:44 CEST 2019
+starting time: Di 9. Nov 22:37:45 CET 2021

 transform phm-transform.json...
-23:33:44.712 [                   refine] GET /command/core/get-models (115ms)
-23:33:44.715 [                   refine] POST /command/core/apply-operations (3ms)
+22:37:45.303 [                   refine] GET /command/core/get-csrf-token (188ms)
+22:37:45.308 [                   refine] GET /command/core/get-models (5ms)
+22:37:45.324 [                   refine] POST /command/core/apply-operations (16ms)
+File /home/felix/git/openrefine-batch/examples/powerhouse-museum/config/phm-transform.json has been successfully applied to project 2252508879578
 STARTED     ELAPSED %MEM %CPU   RSS
-23:33:33       00:20  6.8  164 1121200
+22:37:25       00:34 11.9  175 1940600


 === 5. Export phm-collection ===

-starting time: Mo 29. Jul 23:33:54 CEST 2019
+starting time: Di 9. Nov 22:37:59 CET 2021

 export to file phm-collection.tsv...
-23:33:54.156 [                   refine] GET /command/core/get-models (9441ms)
-23:33:54.158 [                   refine] GET /command/core/get-all-project-metadata (2ms)
-23:33:54.161 [                   refine] POST /command/core/export-rows/phm-collection.tsv (3ms)
+22:37:59.944 [                   refine] GET /command/core/get-csrf-token (14620ms)
+22:37:59.947 [                   refine] GET /command/core/get-models (3ms)
+22:37:59.951 [                   refine] GET /command/core/get-all-project-metadata (4ms)
+22:37:59.954 [                   refine] POST /command/core/export-rows/phm-collection.tsv (3ms)
+Export to file /home/felix/git/openrefine-batch/examples/powerhouse-museum/output/phm-collection.tsv complete
 STARTED     ELAPSED %MEM %CPU   RSS
-23:33:33       01:08  7.1 53.1 1160936
+22:37:25       00:38 12.4  181 2021388


 output (number of lines / size in bytes):
   75728 59431272 /home/felix/git/openrefine-batch/examples/powerhouse-museum/output/phm-collection.tsv

 cleanup...
-23:34:44.740 [           ProjectManager] Saving all modified projects ... (50579ms)
-23:34:46.677 [        project_utilities] Saved project '1675004209805' (1937ms)
+22:38:06.850 [           ProjectManager] Saving all modified projects ... (6896ms)
+22:38:10.014 [        project_utilities] Saved project '2252508879578' (3164ms)

 === Statistics ===

 starting time and run time of each step:
-                      Start process Mo 29. Jul 23:33:34 CEST 2019 (00:00:00)
-                  Launch OpenRefine Mo 29. Jul 23:33:34 CEST 2019 (00:00:05)
-                   Import all files Mo 29. Jul 23:33:39 CEST 2019 (00:00:05)
-         Prepare transform & export Mo 29. Jul 23:33:44 CEST 2019 (00:00:00)
-           Transform phm-collection Mo 29. Jul 23:33:44 CEST 2019 (00:00:10)
-              Export phm-collection Mo 29. Jul 23:33:54 CEST 2019 (00:00:53)
-                        End process Mo 29. Jul 23:34:47 CEST 2019 (00:00:00)
+                      Start process Di 9. Nov 22:37:25 CET 2021 (00:00:00)
+                  Launch OpenRefine Di 9. Nov 22:37:25 CET 2021 (00:00:08)
+                   Import all files Di 9. Nov 22:37:33 CET 2021 (00:00:11)
+         Prepare transform & export Di 9. Nov 22:37:44 CET 2021 (00:00:01)
+           Transform phm-collection Di 9. Nov 22:37:45 CET 2021 (00:00:14)
+              Export phm-collection Di 9. Nov 22:37:59 CET 2021 (00:00:11)
+                        End process Di 9. Nov 22:38:10 CET 2021 (00:00:00)

-total run time: 00:01:13 (hh:mm:ss)
-highest memory load: 1133 MB
-```
-
-### Performance gain with extended cross function
-
-The original cross function expects normalized data (one foreign key per cell in base column). If you have multiple key values in one cell you need to split them first in multiple rows before you apply cross (and join results afterwards). This can be quite "expensive" if you work with bigger datasets.
-
-There is a [fork available that extend the cross function](https://github.com/felixlohmeier/OpenRefine/wiki>) to support an integrated split and may provide a massive performance gain for this special use case.
-
-Here is a code snippet to install this fork together with openrefine-batch.sh in a blank directory:
-```
-wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch.sh && chmod +x openrefine-batch.sh
-sed -i 's/.tar.gz/-with-pr1294.tar.gz/' openrefine-batch.sh
-./openrefine-batch.sh
+total run time: 00:00:45 (hh:mm:ss)
+highest memory load: 1974 MB
 ```

 ### Docker
@ -288,8 +293,14 @@ A variation of the shell script orchestrates a [docker container for OpenRefine]

 **Install**

-1. Install [Docker](https://docs.docker.com/engine/installation/#on-linux) and **a)** [configure Docker to start on boot](https://docs.docker.com/engine/installation/linux/linux-postinstall/#configure-docker-to-start-on-boot) or **b)** start Docker on demand each time you use the script: `sudo systemctl start docker`
-2. Download the script and grant file permissions to execute: `wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch-docker.sh && chmod +x openrefine-batch-docker.sh`
+1. Install [Docker](https://docs.docker.com/engine/installation/#on-linux)
+  * **a)** [configure Docker to start on boot](https://docs.docker.com/engine/installation/linux/linux-postinstall/#configure-docker-to-start-on-boot)
+  * or **b)** start Docker on demand each time you use the script: `sudo systemctl start docker`
+2. Download the script and grant file permissions to execute:
+```
+wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch-docker.sh
+chmod +x openrefine-batch-docker.sh
+```

 **Usage**

@ -298,15 +309,36 @@ mkdir input
 cp INPUTFILES input/
 mkdir config
 cp CONFIGFILES config/
-sudo ./openrefine-batch-docker.sh -a input/ -b config/ -c OUTPUT/
+./openrefine-batch-docker.sh -a input/ -b config/ -c OUTPUT/
 ```

-Why `sudo`? Non-root users can only access the Unix socket of the Docker daemon by using `sudo`. If you created a Docker group in [Post-installation steps for Linux](https://docs.docker.com/engine/installation/linux/linux-postinstall/) then you may call the script without `sudo`.
+The script may ask you for sudo privileges. Why `sudo`? Non-root users can only access the Unix socket of the Docker daemon by using `sudo`. If you created a Docker group in [Post-installation steps for Linux](https://docs.docker.com/engine/installation/linux/linux-postinstall/) then you may call the script without `sudo`.

-### Todo
+**Example**

- [ ] howto for extracting input options from OpenRefine GUI with Firefox network monitor
- [ ] provide more example data from other OpenRefine tutorials
+[Example Powerhouse Museum](examples/powerhouse-museum)
+
+download example data
+
+```
+wget https://github.com/opencultureconsulting/openrefine-batch/archive/master.zip
+unzip master.zip openrefine-batch-master/examples/*
+mv openrefine-batch-master/examples .
+rm -f master.zip
+```
+
+execute openrefine-batch-docker.sh
+
+```
+./openrefine-batch-docker.sh \
+-a examples/powerhouse-museum/input/ \
+-b examples/powerhouse-museum/config/ \
+-c examples/powerhouse-museum/output/ \
+-f tsv \
+-i processQuotes=false \
+-i guessCellValueTypes=true \
+-RX
+```

 ### Licensing

--- a/binder/apt.txt
+++ b/binder/apt.txt
@ -0,0 +1 @@
+openjdk-8-jre
--- a/binder/postBuild
+++ b/binder/postBuild
@ -0,0 +1,5 @@
+#!/bin/bash
+set -e
+
+# Install bash_kernel https://github.com/takluyver/bash_kernel
+python -m bash_kernel.install
--- a/binder/requirements.txt
+++ b/binder/requirements.txt
@ -0,0 +1,2 @@
+jupyter-server-proxy==3.2.1
+bash_kernel==0.7.2
--- a/demo.ipynb
+++ b/demo.ipynb
@ -0,0 +1 @@
+{"metadata":{"language_info":{"name":"bash","codemirror_mode":"shell","mimetype":"text/x-sh","file_extension":".sh"},"kernelspec":{"name":"bash","display_name":"Bash","language":"bash"}},"nbformat_minor":5,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Example Powerhouse Museum\n\nOutput will be stored in examples/powerhouse-museum/output/phm-collection.tsv","metadata":{}},{"cell_type":"code","source":"./openrefine-batch.sh \\\n-a examples/powerhouse-museum/input/ \\\n-b examples/powerhouse-museum/config/ \\\n-c examples/powerhouse-museum/output/ \\\n-f tsv \\\n-i processQuotes=false \\\n-i guessCellValueTypes=true \\\n-RX","metadata":{"trusted":true},"execution_count":null,"outputs":[]}]}
--- a/openrefine-batch-docker.sh
+++ b/openrefine-batch-docker.sh
@ -1,23 +1,32 @@
 #!/bin/bash
-# openrefine-batch-docker.sh, Felix Lohmeier, v1.12, 2019-07-29
+# openrefine-batch-docker.sh, Felix Lohmeier, v1.16, 2021-11-09
 # https://github.com/felixlohmeier/openrefine-batch

 # check system requirements
-DOCKER="$(which docker 2> /dev/null)"
+DOCKER="$(command -v docker 2> /dev/null)"
 if [ -z "$DOCKER" ] ; then
    echo 1>&2 "This action requires you to have 'docker' installed and present in your PATH. You can download it for free at http://www.docker.com/"
    exit 1
 fi
 DOCKERINFO="$(docker info 2>/dev/null | grep 'Server Version')"
-if [ -z "$DOCKERINFO" ] ; then
-    echo 1>&2 "This action requires you to start the docker daemon. Try 'sudo systemctl start docker' or 'sudo start docker'. If the docker daemon is already running then maybe some security privileges are missing to run docker commands. Try to run the script with 'sudo ./openrefine-batch-docker.sh ...'"
-    exit 1
+if [ -z "$DOCKERINFO" ]
+then
+    echo "command 'docker info' failed, trying again with sudo..."
+    DOCKERINFO="$(sudo docker info 2>/dev/null | grep 'Server Version')"
+    echo "OK"
+    docker=(sudo docker)
+    if [ -z "$DOCKERINFO" ] ; then
+        echo 1>&2 "This action requires you to start the docker daemon. Try 'sudo systemctl start docker' or 'sudo start docker'. If the docker daemon is already running then maybe some security privileges are missing to run docker commands.'"
+        exit 1
+    fi
+else
+    docker=(docker)
 fi

 # help screen
 function usage () {
    cat <<EOF
-Usage: sudo ./openrefine-batch-docker.sh [-a INPUTDIR] [-b TRANSFORMDIR] [-c OUTPUTDIR] ...
+Usage: ./openrefine-batch-docker.sh [-a INPUTDIR] [-b TRANSFORMDIR] [-c OUTPUTDIR] ...

 == basic arguments ==
    -a INPUTDIR      path to directory with source files (leave empty to transform only ; multiple files may be imported into a single project by providing a zip or tar.gz archive, cf. https://github.com/OpenRefine/OpenRefine/wiki/Importers )
@ -31,7 +40,7 @@ Usage: sudo ./openrefine-batch-docker.sh [-a INPUTDIR] [-b TRANSFORMDIR] [-c OUT
    -i INPUTOPTIONS  several options provided by openrefine-client, see below...
    -m RAM           maximum RAM for OpenRefine java heap space (default: 2048M)
    -t TEMPLATING    several options for templating export, see below...
-    -v VERSION       OpenRefine version (3.2, 3.1, 3.0, 2.8, 2.7, ...; default: 3.2)
+    -v VERSION       OpenRefine version (3.5.0, 3.4.1, 3.4, 3.3, 3.2, 3.1, 3.0, 2.8, 2.7, ...; default: 3.5.0)
    -E               do NOT export files
    -R               do NOT restart OpenRefine after each transformation (e.g. config file)
    -X               do NOT restart OpenRefine after each project (e.g. input file)
@ -81,7 +90,7 @@ rm -f master.zip

 example 1 (input, transform, export to tsv)

-sudo ./openrefine-batch-docker.sh \
+./openrefine-batch-docker.sh \
 -a examples/powerhouse-museum/input/ \
 -b examples/powerhouse-museum/config/ \
 -c examples/powerhouse-museum/output/ \
@ -92,14 +101,14 @@ sudo ./openrefine-batch-docker.sh \

 example 2 (input, transform, templating export)

-sudo ./openrefine-batch-docker.sh -a examples/powerhouse-museum/input/ -b examples/powerhouse-museum/config/ -c examples/powerhouse-museum/output/ -f tsv -i processQuotes=false -i guessCellValueTypes=true -RX -t template='{ "Record ID" : {{jsonize(cells["Record ID"].value)}}, "Object Title" : {{jsonize(cells["Object Title"].value)}}, "Registration Number" : {{jsonize(cells["Registration Number"].value)}}, "Description." : {{jsonize(cells["Description."].value)}}, "Marks" : {{jsonize(cells["Marks"].value)}}, "Production Date" : {{jsonize(cells["Production Date"].value)}}, "Provenance (Production)" : {{jsonize(cells["Provenance (Production)"].value)}}, "Provenance (History)" : {{jsonize(cells["Provenance (History)"].value)}}, "Categories" : {{jsonize(cells["Categories"].value)}}, "Persistent Link" : {{jsonize(cells["Persistent Link"].value)}}, "Height" : {{jsonize(cells["Height"].value)}}, "Width" : {{jsonize(cells["Width"].value)}}, "Depth" : {{jsonize(cells["Depth"].value)}}, "Diameter" : {{jsonize(cells["Diameter"].value)}}, "Weight" : {{jsonize(cells["Weight"].value)}}, "License info" : {{jsonize(cells["License info"].value)}} }' -t rowSeparator=',' -t prefix='{ "rows" : [ ' -t suffix='] }' -t splitToFiles=true
+./openrefine-batch-docker.sh -a examples/powerhouse-museum/input/ -b examples/powerhouse-museum/config/ -c examples/powerhouse-museum/output/ -f tsv -i processQuotes=false -i guessCellValueTypes=true -RX -t template='{ "Record ID" : {{jsonize(cells["Record ID"].value)}}, "Object Title" : {{jsonize(cells["Object Title"].value)}}, "Registration Number" : {{jsonize(cells["Registration Number"].value)}}, "Description." : {{jsonize(cells["Description."].value)}}, "Marks" : {{jsonize(cells["Marks"].value)}}, "Production Date" : {{jsonize(cells["Production Date"].value)}}, "Provenance (Production)" : {{jsonize(cells["Provenance (Production)"].value)}}, "Provenance (History)" : {{jsonize(cells["Provenance (History)"].value)}}, "Categories" : {{jsonize(cells["Categories"].value)}}, "Persistent Link" : {{jsonize(cells["Persistent Link"].value)}}, "Height" : {{jsonize(cells["Height"].value)}}, "Width" : {{jsonize(cells["Width"].value)}}, "Depth" : {{jsonize(cells["Depth"].value)}}, "Diameter" : {{jsonize(cells["Diameter"].value)}}, "Weight" : {{jsonize(cells["Weight"].value)}}, "License info" : {{jsonize(cells["License info"].value)}} }' -t rowSeparator=',' -t prefix='{ "rows" : [ ' -t suffix='] }' -t splitToFiles=true
 EOF
   exit 1
 }

 # defaults
 ram="2048M"
-version="3.2"
+version="3.5.0"
 restartfile="true"
 restarttransform="true"
 export="true"
@ -200,8 +209,8 @@ memoryload=()
 cleanup()
 {
  echo "cleanup..."
-  docker stop -t=5000 ${uuid}
-  docker rm ${uuid}
+  ${docker[*]} stop -t=5000 ${uuid}
+  ${docker[*]} rm ${uuid}
  rm -r -f "${outputdir:?}"/workspace*.json
  # delete duplicates from copied projects
  if [ -n "$crossprojects" ]; then
@ -218,11 +227,11 @@ echo "=== $checkpoints. ${checkpointname[$((checkpoints + 1))]} ==="
 echo ""
 echo "starting time: $(date --date=@${checkpointdate[$((checkpoints + 1))]})"
 echo ""
-sudo docker run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
+${docker[*]} run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
 # wait until server is available
-until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
+until ${docker[*]} run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client:v0.3.10 --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
 # show server logs
-docker attach ${uuid} &
+${docker[*]} attach ${uuid} &
 echo ""

 # import all files
@ -237,7 +246,7 @@ if [ -n "$inputfiles" ]; then
    for inputfile in "${inputfiles[@]}" ; do
        echo "import ${inputfile}..."
        # run client with input command
-        sudo docker run --rm --link ${uuid} -v ${inputdir}:/data:z felixlohmeier/openrefine-client -H ${uuid} -c $inputfile $inputformat ${inputoptions[@]}
+        ${docker[*]} run --rm --link ${uuid} -v ${inputdir}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H ${uuid} -c $inputfile $inputformat ${inputoptions[@]}
        # show allocated system resources
        ps -o start,etime,%mem,%cpu,rss -C java --sort=start
        memoryload+=($(ps --no-headers -o rss -C java))
@ -245,11 +254,11 @@ if [ -n "$inputfiles" ]; then
        # restart server to clear memory
        if [ "$restartfile" = "true" ]; then
            echo "save project and restart OpenRefine server..." 
-            docker stop -t=5000 ${uuid}
-            docker rm ${uuid}
-            sudo docker run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
-            until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
-            docker attach ${uuid} &
+            ${docker[*]} stop -t=5000 ${uuid}
+            ${docker[*]} rm ${uuid}
+            ${docker[*]} run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
+            until ${docker[*]} run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client:v0.3.10 --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
+            ${docker[*]} attach ${uuid} &
            echo ""
        fi
    done
@ -267,7 +276,7 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
    
    # get project ids
    echo "get project ids..."
-    sudo docker run --rm --link ${uuid} felixlohmeier/openrefine-client -H ${uuid} -l > "${outputdir}/projects.tmp"
+    ${docker[*]} run --rm --link ${uuid} felixlohmeier/openrefine-client:v0.3.10 -H ${uuid} -l > "${outputdir}/projects.tmp"
    projectids=($(cut -c 2-14 "${outputdir}/projects.tmp"))
    projectnames=($(cut -c 17- "${outputdir}/projects.tmp"))
    cat "${outputdir}/projects.tmp" && rm "${outputdir:?}/projects.tmp"
@ -280,11 +289,11 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
        rsync -a --exclude='*.project/history' "${crossdir}"/*.project "${outputdir}"
        # restart server to advertise copied projects
        echo "restart OpenRefine server to advertise copied projects..." 
-        docker stop -t=5000 ${uuid}
-        docker rm ${uuid}
-        sudo docker run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
-        until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
-        docker attach ${uuid} &
+        ${docker[*]} stop -t=5000 ${uuid}
+        ${docker[*]} rm ${uuid}
+        ${docker[*]} run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
+        until ${docker[*]} run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client:v0.3.10 --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
+        ${docker[*]} attach ${uuid} &
        echo ""
    fi
    
@ -303,7 +312,7 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
            for jsonfile in "${jsonfiles[@]}" ; do
                echo "transform ${jsonfile}..."
                # run client with apply command
-                sudo docker run --rm --link ${uuid} -v ${configdir}:/data:z felixlohmeier/openrefine-client -H ${uuid} -f ${jsonfile} ${projectids[i]}
+                ${docker[*]} run --rm --link ${uuid} -v ${configdir}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H ${uuid} -f ${jsonfile} ${projectids[i]}
                # allocated system resources
                ps -o start,etime,%mem,%cpu,rss -C java --sort=start
 	        memoryload+=($(ps --no-headers -o rss -C java))
@ -311,11 +320,11 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
                # restart server to clear memory
                if [ "$restarttransform" = "true" ]; then
                  echo "save project and restart OpenRefine server..." 
-                  docker stop -t=5000 ${uuid}
-                  docker rm ${uuid}
-                  sudo docker run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
-                  until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
-                  docker attach ${uuid} &
+                  ${docker[*]} stop -t=5000 ${uuid}
+                  ${docker[*]} rm ${uuid}
+                  ${docker[*]} run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
+                  until ${docker[*]} run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client:v0.3.10 --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
+                  ${docker[*]} attach ${uuid} &
                fi
                echo ""
            done
@ -334,7 +343,7 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
            filename=${projectnames[i]%.*}
            echo "export to file ${filename}.${exportformat}..."
            # run client with export command
-            sudo docker run --rm --link ${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine-client -H ${uuid} -E --output="${filename}.${exportformat}" "${templating[@]}" ${projectids[i]}
+            ${docker[*]} run --rm --link ${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H ${uuid} -E --output="${filename}.${exportformat}" "${templating[@]}" ${projectids[i]}
            # show allocated system resources
            ps -o start,etime,%mem,%cpu,rss -C java --sort=start
            memoryload+=($(ps --no-headers -o rss -C java))
@ -344,11 +353,11 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
        # restart server to clear memory
        if [ "$restartfile" = "true" ]; then    
              echo "restart OpenRefine server..." 
-              docker stop -t=5000 ${uuid}
-              docker rm ${uuid}
-              sudo docker run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
-              until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
-              docker attach ${uuid} &
+              ${docker[*]} stop -t=5000 ${uuid}
+              ${docker[*]} rm ${uuid}
+              ${docker[*]} run -d --name=${uuid} -v ${outputdir}:/data:z felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
+              until ${docker[*]} run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client:v0.3.10 --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
+              ${docker[*]} attach ${uuid} &
        fi
        echo ""        

--- a/openrefine-batch.sh
+++ b/openrefine-batch.sh
@ -1,10 +1,10 @@
 #!/bin/bash
-# openrefine-batch.sh, Felix Lohmeier, v1.12, 2019-07-29
+# openrefine-batch.sh, Felix Lohmeier, v1.16, 2021-11-09
 # https://github.com/felixlohmeier/openrefine-batch

 # declare download URLs for OpenRefine and OpenRefine client
-openrefine_URL="https://github.com/OpenRefine/OpenRefine/releases/download/3.2/openrefine-linux-3.2.tar.gz"
-client_URL="https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.4/openrefine-client_0-3-4_linux-64bit"
+openrefine_URL="https://github.com/OpenRefine/OpenRefine/releases/download/3.5.0/openrefine-linux-3.5.0.tar.gz"
+client_URL="https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.10/openrefine-client_0-3-10_linux"

 # check system requirements
 JAVA="$(which java 2> /dev/null)"
@ -34,7 +34,7 @@ if [ ! -d "openrefine-client" ]; then
    echo "Download OpenRefine client..."
    mkdir -p openrefine-client
    wget -q -P openrefine-client $wget_opt $client_URL
-    chmod +x openrefine-client/openrefine-client_0-3-4_linux-64bit
+    chmod +x openrefine-client/openrefine-client_0-3-10_linux
    echo ""
 fi

@ -259,7 +259,7 @@ if [ -n "$inputfiles" ]; then
    for inputfile in "${inputfiles[@]}" ; do
        echo "import ${inputfile}..."
        # run client with input command
-        openrefine-client/openrefine-client_0-3-4_linux-64bit -P ${port} -c ${inputdir}/${inputfile} $inputformat "${inputoptions[@]}"
+        openrefine-client/openrefine-client_0-3-10_linux -P ${port} -c ${inputdir}/${inputfile} $inputformat "${inputoptions[@]}"
        # show allocated system resources
        ps -o start,etime,%mem,%cpu,rss -p ${pid} --sort=start
        memoryload+=($(ps --no-headers -o rss -p ${pid}))
@ -290,7 +290,7 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
    
    # get project ids
    echo "get project ids..."
-    openrefine-client/openrefine-client_0-3-4_linux-64bit -P ${port} -l > "${outputdir}/projects.tmp"
+    openrefine-client/openrefine-client_0-3-10_linux -P ${port} -l > "${outputdir}/projects.tmp"
    projectids=($(cut -c 2-14 "${outputdir}/projects.tmp"))
    projectnames=($(cut -c 17- "${outputdir}/projects.tmp"))
    cat "${outputdir}/projects.tmp" && rm "${outputdir:?}/projects.tmp"
@ -327,7 +327,7 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
            for jsonfile in "${jsonfiles[@]}" ; do
                echo "transform ${jsonfile}..."
                # run client with apply command
-                openrefine-client/openrefine-client_0-3-4_linux-64bit -P ${port} -f ${configdir}/${jsonfile} ${projectids[i]}
+                openrefine-client/openrefine-client_0-3-10_linux -P ${port} -f ${configdir}/${jsonfile} ${projectids[i]}
                # allocated system resources
                ps -o start,etime,%mem,%cpu,rss -p ${pid} --sort=start
                memoryload+=($(ps --no-headers -o rss -p ${pid}))
@ -359,7 +359,7 @@ if [ -n "$jsonfiles" ] || [ "$export" = "true" ]; then
            filename=${projectnames[i]%.*}
            echo "export to file ${filename}.${exportformat}..."
            # run client with export command
-            openrefine-client/openrefine-client_0-3-4_linux-64bit -P ${port} -E --output="${outputdir}/${filename}.${exportformat}" "${templating[@]}" ${projectids[i]}
+            openrefine-client/openrefine-client_0-3-10_linux -P ${port} -E --output="${outputdir}/${filename}.${exportformat}" "${templating[@]}" ${projectids[i]}
            # show allocated system resources
            ps -o start,etime,%mem,%cpu,rss -p ${pid} --sort=start
            memoryload+=($(ps --no-headers -o rss -p ${pid}))
Author	SHA1	Message	Date
Felix Lohmeier	2cc2378085	Merge pull request #6 from opencultureconsulting/dependabot/pip/binder/jupyter-server-proxy-3.2.1 Bump jupyter-server-proxy from 1.5.3 to 3.2.1 in /binder	2022-01-28 17:23:34 +01:00
dependabot[bot]	9e6e42261b	Bump jupyter-server-proxy from 1.5.3 to 3.2.1 in /binder Bumps [jupyter-server-proxy](https://github.com/jupyterhub/jupyter-server-proxy) from 1.5.3 to 3.2.1. - [Release notes](https://github.com/jupyterhub/jupyter-server-proxy/releases) - [Changelog](https://github.com/jupyterhub/jupyter-server-proxy/blob/main/CHANGELOG.md) - [Commits](https://github.com/jupyterhub/jupyter-server-proxy/compare/v1.5.3...v3.2.1) --- updated-dependencies: - dependency-name: jupyter-server-proxy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-01-27 16:25:09 +00:00
Felix Lohmeier	4e32074d85	OpenRefine 3.5.0	2021-11-09 23:14:30 +01:00
Felix Lohmeier	a9c494856b	cleanup README	2021-06-17 13:00:33 +02:00
Felix Lohmeier	ca19d7ef16	add jupyter notebook	2021-06-17 12:59:47 +02:00
Felix Lohmeier	93be203efe	add binder config	2021-06-17 12:42:22 +02:00
Felix Lohmeier	2894b0194f	fix codacy badge	2021-02-12 13:38:33 +01:00
Felix Lohmeier	4199fadc04	OpenRefine 3.4.1, openrefine-client 0.3.10	2021-01-04 17:37:49 +01:00
Felix Lohmeier	80fb37cb65	release v1.14	2020-08-08 13:45:28 +02:00
Felix Lohmeier	68dbc04c01	update openrefine-client to v0.3.9	2020-08-08 13:43:00 +02:00
Felix Lohmeier	b259cf571c	release v1.13: improved use of sudo in docker version, pinned version of openrefine-client, improved README	2019-08-06 21:21:59 +02:00
				`@ -0,0 +1 @@`
				{"metadata":{"language_info":{"name":"bash","codemirror_mode":"shell","mimetype":"text/x-sh","file_extension":".sh"},"kernelspec":{"name":"bash","display_name":"Bash","language":"bash"}},"nbformat_minor":5,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Example Powerhouse Museum\n\nOutput will be stored in examples/powerhouse-museum/output/phm-collection.tsv","metadata":{}},{"cell_type":"code","source":"./openrefine-batch.sh \\\n-a examples/powerhouse-museum/input/ \\\n-b examples/powerhouse-museum/config/ \\\n-c examples/powerhouse-museum/output/ \\\n-f tsv \\\n-i processQuotes=false \\\n-i guessCellValueTypes=true \\\n-RX","metadata":{"trusted":true},"execution_count":null,"outputs":[]}]}