release v0.6

This commit is contained in:
Felix Lohmeier 2017-03-01 17:48:13 +01:00
parent 2f0d8fb080
commit 4ee785ecf3
3 changed files with 164 additions and 163 deletions

128
README.md
View File

@ -1,6 +1,6 @@
## OpenRefine batch processing (openrefine-batch.sh) ## OpenRefine batch processing (openrefine-batch.sh)
Shell script to run OpenRefine on Windows, Linux or Mac in batch mode (import, transform, export). This bash script automatically... Shell script to run OpenRefine in batch mode (import, transform, export). This bash script automatically...
1. imports all data from a given directory into OpenRefine 1. imports all data from a given directory into OpenRefine
2. transforms the data by applying OpenRefine transformation rules from all json files in another given directory and 2. transforms the data by applying OpenRefine transformation rules from all json files in another given directory and
@ -10,45 +10,55 @@ It orchestrates a [docker container for OpenRefine](https://hub.docker.com/r/fel
### Typical Workflow ### Typical Workflow
- Step 1: Do some experiments with your data (or parts of it) in the graphical user interface of OpenRefine. If you are fine with all transformation rules, [extract the json code](http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html) and save it as file (e.g. transform.json). - **Step 1**: Do some experiments with your data (or parts of it) in the graphical user interface of OpenRefine. If you are fine with all transformation rules, [extract the json code](http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html) and save it as file (e.g. transform.json).
- Step 2: Put your data and the json file(s) in two different directories and execute the script. The script will automatically import all data files in OpenRefine projects, apply the transformation rules in the json files to each project and export all projects in TSV-files. - **Step 2**: Put your data and the json file(s) in two different directories and execute the script. The script will automatically import all data files in OpenRefine projects, apply the transformation rules in the json files to each project and export all projects in TSV-files.
### Install ### Install
Linux: 1. Install [Docker](https://docs.docker.com/engine/installation/#on-linux) and **a)** [configure Docker to start on boot](https://docs.docker.com/engine/installation/linux/linux-postinstall/#configure-docker-to-start-on-boot) or **b)** start Docker on demand each time you use the script: `sudo systemctl start docker`
2. Download the script and grant file permissions to execute: `wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch.sh && chmod +x openrefine-batch.sh`
1. Install [Docker](https://docs.docker.com/engine/installation/#on-linux)
2. Open Terminal and enter `wget https://github.com/felixlohmeier/openrefine-batch/raw/master/openrefine-batch.sh && chmod +x openrefine-batch.sh`
Mac:
1. Install Docker
2. ...
Windows:
1. Install Docker
2. Install Cygwin with Bash
3. ...
### Usage ### Usage
``` ```
./openrefine-batch.sh input/ config/ output/ mkdir -p input && cp INPUTFILES input/
mkdir -p config && cp CONFIGFILES config/
sudo ./openrefine-batch.sh input/ config/ OUTPUT/
``` ```
Why `sudo`? Non-root users can only access the Unix socket of the Docker daemon by using `sudo`. If you created a Docker group in [Post-installation steps for Linux](https://docs.docker.com/engine/installation/linux/linux-postinstall/) then you may call the script without `sudo`.
#### INPUTFILES
* any data that [OpenRefine supports](https://github.com/OpenRefine/OpenRefine/wiki/Importers). CSV, TSV and line-based files should work out of the box. XML, JSON, fixed-width, XSLX and ODS need one additional input parameter (see chapter options below):
* multiple slices of data may be transformed into a into a single file [by providing a zip or tar.gz archive])
* you may use hard symlinks instead of cp: `ln INPUTFILE input/`
#### CONFIGFILES
* JSON files with [OpenRefine transformation rules)](http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html)
#### OUTPUT/
* path to directory where results and temporary data should be stored
* Transformed data will be stored in this directory in TSV (tab-separated values) format. Show results: `ls OUTPUT/*.tsv`
* OpenRefine stores data in directories like "1234567890123.project". You may have a look at the results by starting OpenRefine with this workspace. Delete the directories if you do not need them: `rm -r -f OUTPUT/*.project`
#### Example #### Example
clone or [download GitHub repository](https://github.com/felixlohmeier/openrefine-batch/archive/master.zip) to get example data clone or [download GitHub repository](https://github.com/felixlohmeier/openrefine-batch/archive/master.zip) to get example data
``` ```
./openrefine-batch.sh examples/powerhouse-museum/input/ examples/powerhouse-museum/config/ examples/powerhouse-museum/output/ examples/powerhouse-museum/cross/ 4G 2.7rc1 tsv --processQuotes=false --guessCellValueTypes=true sudo ./openrefine-batch.sh \
examples/powerhouse-museum/input/ \
examples/powerhouse-museum/config/ \
examples/powerhouse-museum/output/ \
examples/powerhouse-museum/cross/ \
2G 2.7rc1 restartfile-false restarttransform-false export-true \
tsv --processQuotes=false --guessCellValueTypes=true
``` ```
#### Options #### Options
``` ```
./openrefine-batch.sh $inputdir $configdir $outputdir $crossdir $ram $version $restart $inputformat $inputoptions sudo ./openrefine-batch.sh $inputdir $configdir $outputdir $crossdir $ram $version $restartfile $restarttransform $export $inputformat $inputoptions
``` ```
1. inputdir: path to directory with source files (multiple files may be imported into a single project [by providing a zip or tar.gz archive](https://github.com/OpenRefine/OpenRefine/wiki/Importers)) 1. inputdir: path to directory with source files (multiple files may be imported into a single project [by providing a zip or tar.gz archive](https://github.com/OpenRefine/OpenRefine/wiki/Importers))
@ -56,8 +66,10 @@ clone or [download GitHub repository](https://github.com/felixlohmeier/openrefin
3. outputdir: path to directory for exported files (and OpenRefine workspace) 3. outputdir: path to directory for exported files (and OpenRefine workspace)
4. crossdir: path to directory with additional OpenRefine projects (will be copied to workspace before transformation step to support the [cross function](https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#crosscell-c-string-projectname-string-columnname)) 4. crossdir: path to directory with additional OpenRefine projects (will be copied to workspace before transformation step to support the [cross function](https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#crosscell-c-string-projectname-string-columnname))
5. ram: maximum RAM for OpenRefine java heap space (default: 4G) 5. ram: maximum RAM for OpenRefine java heap space (default: 4G)
6. version: OpenRefine version (2.7rc1, 2.6rc2, 2.6rc1, dev) 6. version: OpenRefine version (2.7rc1, 2.6rc2, 2.6rc1, dev; default: 2.7rc1)
7. restart: restart docker container after each transformation to clear memory (restart-true/restart-false) 7. restartfile: restart docker after each project (e.g. input file) to clear memory (restartfile-true/restartfile-false; default: restartfile-true)
8. restarttransform: restart docker container after each transformation (e.g. config file) to clear memory (restarttransform-true/restarttransform-false; default: restarttransform-false)
9. export: toggle on/off (export-true/export-false; default: export-true)
8. inputformat: (csv, tsv, xml, json, line-based, fixed-width, xlsx, ods) 8. inputformat: (csv, tsv, xml, json, line-based, fixed-width, xlsx, ods)
9. inputoptions: several options provided by [openrefine-client](https://hub.docker.com/r/felixlohmeier/openrefine-client/) 9. inputoptions: several options provided by [openrefine-client](https://hub.docker.com/r/felixlohmeier/openrefine-client/)
@ -67,6 +79,7 @@ inputoptions (mandatory for xml, json, fixed-width, xslx, ods):
* `--sheets=SHEETS` (xlsx, ods): please provide sheets separated by comma (e.g. 0,1), default: 0 (first sheet) * `--sheets=SHEETS` (xlsx, ods): please provide sheets separated by comma (e.g. 0,1), default: 0 (first sheet)
more inputoptions (optional, only together with inputformat): more inputoptions (optional, only together with inputformat):
* `--projectName=PROJECTNAME` (all formats)
* `--limit=LIMIT` (all formats), default: -1 * `--limit=LIMIT` (all formats), default: -1
* `--includeFileSources=INCLUDEFILESOURCES` (all formats), default: false * `--includeFileSources=INCLUDEFILESOURCES` (all formats), default: false
* `--trimStrings=TRIMSTRINGS` (xml, json), default: false * `--trimStrings=TRIMSTRINGS` (xml, json), default: false
@ -86,80 +99,13 @@ more inputoptions (optional, only together with inputformat):
The script uses `docker attach` to print log messages from OpenRefine server and `ps` to show statistics for each step. Here is a sample log: The script uses `docker attach` to print log messages from OpenRefine server and `ps` to show statistics for each step. Here is a sample log:
``` ```
[03:27 felix ~/openrefine-batch (master *)]$ ./openrefine-batch.sh examples/powerhouse-museum/input/ examples/powerhouse-museum/config/ examples/powerhouse-museum/output/ examples/powerhouse-museum/cross/ 4G 2.7rc1 restart-true tsv --processQuotes=false --guessCellValueTypes=true
Input directory: /home/felix/openrefine-batch/examples/powerhouse-museum/input
Input files: phm-collection.tsv
Input format: --format=tsv
Input options: --processQuotes=false --guessCellValueTypes=true
Config directory: /home/felix/openrefine-batch/examples/powerhouse-museum/config
Transformation rules: phm-transform.json
Cross directory: /home/felix/openrefine-batch/examples/powerhouse-museum/cross
Cross projects:
OpenRefine heap space: 4G
OpenRefine version: 2.7rc1
Docker restart: restart-true
Docker container: 6b7eb36f-fc72-4040-b135-acee36948c13
Output directory: /home/felix/openrefine-batch/examples/powerhouse-museum/output
begin: Mo 27. Feb 03:28:45 CET 2017
start OpenRefine server...
[sudo] password for felix:
92499ecd252a8768ea5b57e0be0fb30fe6340eab67d28b1be158e0ad01f79419
import phm-collection.tsv...
New project: 2325849087106
Number of rows: 75814
STARTED ELAPSED %MEM %CPU RSS
03:28:55 00:29 10.0 122 812208
save project and restart OpenRefine server...
02:29:28.170 [ ProjectManager] Saving all modified projects ... (4594ms)
02:29:36.414 [ project_utilities] Saved project '2325849087106' (8244ms)
6b7eb36f-fc72-4040-b135-acee36948c13
6b7eb36f-fc72-4040-b135-acee36948c13
f28de26b99475c4db09dbfb9ab3d445aa8127dedd08b8e729cb6b4d65c96bf38
begin project 2325849087106 @ Mo 27. Feb 03:29:52 CET 2017
transform phm-transform.json...
02:29:54.372 [ refine] GET /command/core/get-models (2815ms)
02:29:57.525 [ project] Loaded project 2325849087106 from disk in 3 sec(s) (3153ms)
02:29:57.640 [ refine] POST /command/core/apply-operations (115ms)
STARTED ELAPSED %MEM %CPU RSS
03:29:38 01:07 19.6 128 1588152
save project and restart OpenRefine server...
02:30:46.280 [ ProjectManager] Saving all modified projects ... (48640ms)
02:30:53.404 [ project_utilities] Saved project '2325849087106' (7124ms)
6b7eb36f-fc72-4040-b135-acee36948c13
6b7eb36f-fc72-4040-b135-acee36948c13
186b0bda0ca542642ce1875d55f8341648e05248eb359541b80191832783f40b
export to file 2325849087106.tsv...
02:31:08.149 [ refine] GET /command/core/get-models (4039ms)
02:31:11.485 [ project] Loaded project 2325849087106 from disk in 3 sec(s) (3336ms)
02:31:11.756 [ refine] GET /command/core/get-all-project-metadata (271ms)
02:31:11.774 [ refine] POST /command/core/export-rows/phm-collection.tsv.tsv (18ms)
STARTED ELAPSED %MEM %CPU RSS
03:30:55 01:59 11.6 28.6 942900
restart OpenRefine server...
6b7eb36f-fc72-4040-b135-acee36948c13
6b7eb36f-fc72-4040-b135-acee36948c13
eb0f91675b5fbf21b4c17cceb6d93146876ea19316b7ab44af78a36f64ff1037
finished project 2325849087106 @ Mo 27. Feb 03:33:11 CET 2017
output (number of lines / size in bytes):
167017 60527726 /home/felix/openrefine-batch/examples/powerhouse-museum/output/2325849087106.tsv
cleanup...
6b7eb36f-fc72-4040-b135-acee36948c13
6b7eb36f-fc72-4040-b135-acee36948c13
finish: Mo 27. Feb 03:33:17 CET 2017
``` ```
### Todo ### Todo
- [ ] howto for installation on Mac and Windows
- [ ] howto for extracting input options from OpenRefine GUI with Firefox network monitor
- [ ] use getopts for parsing of arguments - [ ] use getopts for parsing of arguments
- [ ] howto for extracting input options from OpenRefine GUI with Firefox network monitor
- [ ] add option to delete openrefine projects in output directory - [ ] add option to delete openrefine projects in output directory
- [ ] provide more example data from other OpenRefine tutorials - [ ] provide more example data from other OpenRefine tutorials

View File

@ -7,14 +7,20 @@ Seth van Hooland, Ruben Verborgh and Max De Wilde (August 5, 2013): Cleaning Dat
## Usage ## Usage
``` ```
./openrefine-batch.sh examples/powerhouse-museum/input/ examples/powerhouse-museum/config/ examples/powerhouse-museum/output/ 4G tsv --processQuotes=false --guessCellValueTypes=true sudo ./openrefine-batch.sh \
examples/powerhouse-museum/input/ \
examples/powerhouse-museum/config/ \
examples/powerhouse-museum/output/ \
examples/powerhouse-museum/cross/ \
2G 2.7rc1 restartfile-false restarttransform-false export-true \
tsv --processQuotes=false --guessCellValueTypes=true
``` ```
## phm-collection.tsv ## input/phm-collection.tsv
* The [Powerhouse Museum in Sydney](https://maas.museum/powerhouse-museum/) provides a freely available metadata export of its collection on its website. The collection metadata has been retrieved from the website freeyourmetadata.org that has redistributed the data: http://data.freeyourmetadata.org/powerhouse-museum/ * The [Powerhouse Museum in Sydney](https://maas.museum/powerhouse-museum/) provides a freely available metadata export of its collection on its website. The collection metadata has been retrieved from the website freeyourmetadata.org that has redistributed the data: http://data.freeyourmetadata.org/powerhouse-museum/
## phm-tutorial.json ## config/phm-tutorial.json
* All steps from the tutorial above, extracted from the history of the processed tutorial project, retrieved from the website freeyourmetadata.org: [phm-collection-cleaned.google-refine.tar.gz](http://data.freeyourmetadata.org/powerhouse-museum/phm-collection-cleaned.google-refine.tar.gz) * All steps from the tutorial above, extracted from the history of the processed tutorial project, retrieved from the website freeyourmetadata.org: [phm-collection-cleaned.google-refine.tar.gz](http://data.freeyourmetadata.org/powerhouse-museum/phm-collection-cleaned.google-refine.tar.gz)

View File

@ -1,5 +1,5 @@
#!/bin/bash #!/bin/bash
# openrefine-batch.sh, Felix Lohmeier, v0.5, 27.02.2017 # openrefine-batch.sh, Felix Lohmeier, v0.6, 01.03.2017
# https://github.com/felixlohmeier/openrefine-batch # https://github.com/felixlohmeier/openrefine-batch
# user input # user input
@ -49,128 +49,177 @@ if [ -z "$6" ]
fi fi
if [ -z "$7" ] if [ -z "$7" ]
then then
restart="restart-true" restartfile="restartfile-true"
else else
restart="$7" restartfile="$7"
fi fi
if [ -z "$8" ] if [ -z "$8" ]
then then
inputformat="" restarttransform="restarttransform-false"
else else
inputformat="--format=${8}" restarttransform="$8"
fi fi
if [ -z "$9" ] if [ -z "$9" ]
then
export="export-true"
else
export="$9"
fi
if [ -z "${10}" ]
then
inputformat=""
else
inputformat="--format=${10}"
fi
if [ -z "${11}" ]
then then
inputoptions="" inputoptions=""
else else
inputoptions=( "$9" "${10}" "${11}" "${12}" "${13}" "${14}" "${15}" "${16}" "${17}" "${18}" "${19}" "${20}" ) inputoptions=( "${11}" "${12}" "${13}" "${14}" "${15}" "${16}" "${17}" "${18}" "${19}" "${20}" "${21}" "${22}" "${23}" "${24}" "${25}" )
fi fi
# variables # variables
uuid=$(cat /proc/sys/kernel/random/uuid) uuid=$(cat /proc/sys/kernel/random/uuid)
echo "Input directory: $inputdir" echo "Input directory: $inputdir"
echo "Input files: ${inputfiles[@]}" echo "Input files: ${inputfiles[@]}"
echo "Input format: $inputformat" echo "Input format: $inputformat"
echo "Input options: ${inputoptions[@]}" echo "Input options: ${inputoptions[@]}"
echo "Config directory: $configdir" echo "Config directory: $configdir"
echo "Transformation rules: ${jsonfiles[@]}" echo "Transformation rules: ${jsonfiles[@]}"
echo "Cross directory: $crossdir" echo "Cross directory: $crossdir"
echo "Cross projects: ${crossprojects[@]}" echo "Cross projects: ${crossprojects[@]}"
echo "OpenRefine heap space: $ram" echo "OpenRefine heap space: $ram"
echo "OpenRefine version: $version" echo "OpenRefine version: $version"
echo "Docker container: $uuid" echo "OpenRefine workspace: $outputdir"
echo "Docker restart: $restart" echo "Export TSV to workspace: $export"
echo "Output directory: $outputdir" echo "Docker container name: $uuid"
echo "restart after file: $restartfile"
echo "restart after transform: $restarttransform"
echo "" echo ""
# time # time
echo "begin: $(date)" echo "begin: $(date)"
echo "" echo ""
# launch openrefine server # launch server
echo "start OpenRefine server..." echo "start OpenRefine server..."
sudo docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done # wait until server is available
until docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
# show server logs
docker attach ${uuid} &
echo "" echo ""
# import all files
if [ -n "$inputfiles" ]; then if [ -n "$inputfiles" ]; then
# import all files echo "=== IMPORT ==="
echo ""
for inputfile in "${inputfiles[@]}" ; do for inputfile in "${inputfiles[@]}" ; do
echo "import ${inputfile}..." echo "import ${inputfile}..."
# import # run client with input command
sudo docker run --rm --link ${uuid} -v ${inputdir}:/data felixlohmeier/openrefine-client -H ${uuid} -c $inputfile $inputformat ${inputoptions[@]} docker run --rm --link ${uuid} -v ${inputdir}:/data felixlohmeier/openrefine-client -H ${uuid} -c $inputfile $inputformat ${inputoptions[@]}
# show server logs # show statistics
sudo docker attach ${uuid} &
# statistics
ps -o start,etime,%mem,%cpu,rss -C java --sort=start ps -o start,etime,%mem,%cpu,rss -C java --sort=start
# restart server to clear memory
echo "save project and restart OpenRefine server..."
sudo docker stop -t=5000 ${uuid}
sudo docker rm ${uuid}
sudo docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
echo "" echo ""
# restart server to clear memory
if [ "$restartfile" = "restartfile-true" ]; then
echo "save project and restart OpenRefine server..."
docker stop -t=5000 ${uuid}
docker rm ${uuid}
docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
until docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
docker attach ${uuid} &
echo ""
fi
done done
fi fi
# get project ids echo "=== TRANSFORM / EXPORT ==="
projects=($(sudo docker run --rm --link ${uuid} felixlohmeier/openrefine-client -H ${uuid} -l | cut -c 2-14)) echo ""
# copy existing projects for use with OpenRefine cross function # get project ids
echo "get project ids..."
projects=($(docker run --rm --link ${uuid} felixlohmeier/openrefine-client -H ${uuid} -l | tee ${outputdir}/projects.tmp | cut -c 2-14))
cat ${outputdir}/projects.tmp && rm ${outputdir}/projects.tmp
echo ""
# provide additional OpenRefine projects for cross function
if [ -n "$crossprojects" ]; then if [ -n "$crossprojects" ]; then
echo "provide additional projects for cross function..."
# copy given projects to workspace
rsync -a --exclude='*.project/history' $crossdir/*.project $outputdir rsync -a --exclude='*.project/history' $crossdir/*.project $outputdir
# restart server to advertise copied projects
echo "restart OpenRefine server to advertise copied projects..."
docker stop -t=5000 ${uuid}
docker rm ${uuid}
docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
until docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
docker attach ${uuid} &
echo ""
fi fi
# loop for all projects # loop for all projects
for projectid in "${projects[@]}" ; do for projectid in "${projects[@]}" ; do
echo "begin project $projectid @ $(date)" # time
# show server logs echo "--- begin project $projectid @ $(date) ---"
sudo docker attach ${uuid} & echo ""
# apply transformation rules
if [ -n "$jsonfiles" ]; then if [ -n "$jsonfiles" ]; then
# apply transformation rules
for jsonfile in "${jsonfiles[@]}" ; do for jsonfile in "${jsonfiles[@]}" ; do
echo "transform ${jsonfile}..." echo "transform ${jsonfile}..."
# apply # run client with apply command
sudo docker run --rm --link ${uuid} -v ${configdir}:/data felixlohmeier/openrefine-client -H ${uuid} -f ${jsonfile} ${projectid} docker run --rm --link ${uuid} -v ${configdir}:/data felixlohmeier/openrefine-client -H ${uuid} -f ${jsonfile} ${projectid}
# statistics # show statistics
ps -o start,etime,%mem,%cpu,rss -C java --sort=start ps -o start,etime,%mem,%cpu,rss -C java --sort=start
if [ "$restart" = "restart-true" ]; then # restart server to clear memory
# restart server to clear memory if [ "$restarttransform" = "restarttransform-true" ]; then
echo "save project and restart OpenRefine server..." echo "save project and restart OpenRefine server..."
sudo docker stop -t=5000 ${uuid} docker stop -t=5000 ${uuid}
sudo docker rm ${uuid} docker rm ${uuid}
sudo docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done until docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
sudo docker attach ${uuid} & docker attach ${uuid} &
fi fi
echo ""
done done
fi fi
# export files
echo "export to file ${projectid}.tsv..." # export project to workspace
# export if [ "$export" = "export-true" ]; then
sudo docker run --rm --link ${uuid} -v ${outputdir}:/data felixlohmeier/openrefine-client -H ${uuid} -E --output=${projectid}.tsv ${projectid} echo "export to file ${projectid}.tsv..."
# statistics # run client with export command
ps -o start,etime,%mem,%cpu,rss -C java --sort=start docker run --rm --link ${uuid} -v ${outputdir}:/data felixlohmeier/openrefine-client -H ${uuid} -E --output=${projectid}.tsv ${projectid}
# restart server to clear memory # show statistics
echo "restart OpenRefine server..." ps -o start,etime,%mem,%cpu,rss -C java --sort=start
sudo docker stop -t=5000 ${uuid} # restart server to clear memory
sudo docker rm ${uuid} if [ "$restartfile" = "restartfile-true" ]; then
sudo docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data echo "restart OpenRefine server..."
until sudo docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done docker stop -t=5000 ${uuid}
docker rm ${uuid}
docker run -d --name=${uuid} -v ${outputdir}:/data felixlohmeier/openrefine:${version} -i 0.0.0.0 -m ${ram} -d /data
until docker run --rm --link ${uuid} --entrypoint /usr/bin/curl felixlohmeier/openrefine-client --silent -N http://${uuid}:3333 | cat | grep -q -o "OpenRefine" ; do sleep 1; done
docker attach ${uuid} &
fi
echo""
fi
# time # time
echo "finished project $projectid @ $(date)" echo "--- finished project $projectid @ $(date) ---"
echo "" echo ""
done done
# list output files # list output files
echo "output (number of lines / size in bytes):" if [ "$export" = "export-true" ]; then
wc -c -l ${outputdir}/*.tsv echo "output (number of lines / size in bytes):"
echo "" wc -c -l ${outputdir}/*.tsv
echo ""
fi
# cleanup # cleanup
echo "cleanup..." echo "cleanup..."
sudo docker stop -t=5000 ${uuid} docker stop -t=5000 ${uuid}
sudo docker rm ${uuid} docker rm ${uuid}
rm -r -f ${outputdir}/workspace*.json rm -r -f ${outputdir}/workspace*.json
echo "" echo ""