openrefine-client/docker/README.md

2.7 KiB

batch processing with python-client

There are some client libraries for OpenRefine that communicate with the OpenRefine API. I have prepared a docker container on top of the Python Library from PaulMakepeace and extended the CLI with some options to create new OpenRefine projects from files.

basic usage

  1. start server: docker run -d --name=openrefine felixlohmeier/openrefine

  2. start client (prints help screen): docker run --rm --link openrefine felixlohmeier/openrefine-client -H openrefine

example for customized run commands in interactive mode (e.g. for usage in terminals)

  1. start server in terminal A: docker run --rm --name=openrefine -p 80:3333 -v /home/felix/refine:/data:z felixlohmeier/openrefine -i 0.0.0.0 -m 4G -d /data
  • automatically remove docker container when it exits
  • set name "openrefine" for docker container
  • publish internal port 3333 to host port 80
  • mount host directory /home/felix/refine as working directory
  • make openrefine available in the network
  • increase java heap size to 4 GB
  • set refine workspace to /data
  • OpenRefine should be available at http://localhost
  1. start client in terminal B (prints help screen): docker run --rm --link openrefine -v /home/felix/refine:/data:z felixlohmeier/openrefine-client -H openrefine
  • automatically remove docker container when it exits
  • build up network connection with docker container "openrefine"
  • mount host directory /home/felix/refine as working directory
  • apply history in file /home/felix/refine/history.json to project with id 1234567890123

example for customized run commands in detached mode (e.g. for usage in shell scripts)

  1. define variables
  • workingdir=/home/felix/refine
  • inputfile=example.csv
  • jsonfile=test.json
  1. start server docker run --d --name=openrefine -v ${workingdir}:/data:z felixlohmeier/openrefine -i 0.0.0.0 -m 4G -d /data

  2. create project (import file) docker run --rm --link openrefine -v ${workingdir}:/data:z felixlohmeier/openrefine-client -H openrefine -c $inputfile

  3. get project id project=($(docker run --rm --link openrefine -v ${workingdir}:/data felixlohmeier/openrefine-client -H openrefine --list | cut -c 2-14))

  4. apply transformations from json file docker run --rm --link -v ${workingdir}:/data felixlohmeier/openrefine-client -H openrefine -f ${jsonfile} ${project}

  5. export project to file docker run --rm --link openrefine -v ${workingdir}:/data felixlohmeier/openrefine-client -E --output=${project}.tsv ${project}

  6. cleanup

  • docker stop -t=500 openrefine
  • docker rm openrefine