openrefine-client/docker/README.md

3.4 KiB

batch processing with python-client

There are some client libraries for OpenRefine that communicate with the OpenRefine API. I have prepared a docker container on top of the Python Library from PaulMakepeace and extended the CLI with some options to create new OpenRefine projects from files.

basic usage

  1. start server: docker run -d --name=openrefine-server felixlohmeier/openrefine

  2. run client with one of the following commands:

  • list projects: docker run --rm --link openrefine-server felixlohmeier/openrefine-client --list
  • create project from file: docker run --rm --link openrefine-server felixlohmeier/openrefine-client --create [FILE] [PROJECTID]
  • apply rules from json file: docker run --rm --link openrefine-server felixlohmeier/openrefine-client --apply [FILE.json] [PROJECTID]
  • export project to file: docker run --rm --link openrefine-server felixlohmeier/openrefine-client --export [PROJECTID] --output=FILE.tsv
  • check help screen for more options: docker run --rm --link openrefine-server felixlohmeier/openrefine-client --help
  1. cleanup: docker stop openrefine-server && docker rm openrefine-server

example for customized run commands in interactive mode (e.g. for usage in terminals)

  1. start server in terminal A:

docker run --rm --name=openrefine-server -p 80:3333 -v /home/felix/refine:/data:z felixlohmeier/openrefine -i 0.0.0.0 -m 4G -d /data

  • automatically remove docker container when it exits
  • set name "openrefine" for docker container
  • publish internal port 3333 to host port 80
  • mount host directory /home/felix/refine as working directory
  • make openrefine available in the network
  • increase java heap size to 4 GB
  • set refine workspace to /data
  • OpenRefine should be available at http://localhost
  1. start client in terminal B (prints help screen):

docker run --rm --link openrefine-server -v /home/felix/refine:/data:z felixlohmeier/openrefine-client

  • automatically remove docker container when it exits
  • build up network connection with docker container "openrefine"
  • mount host directory /home/felix/refine as working directory
  • apply history in file /home/felix/refine/history.json to project with id 1234567890123

example for customized run commands in detached mode (e.g. for usage in shell scripts)

  1. define variables (bring your own example data)

workingdir=/home/felix/refine inputfile=example.csv jsonfile=test.json

  1. start server

docker run -d --name=openrefine-server -v ${workingdir}:/data:z felixlohmeier/openrefine -i 0.0.0.0 -m 4G -d /data

  1. create project (import file)

docker run --rm --link openrefine-server -v ${workingdir}:/data:z felixlohmeier/openrefine-client --create $inputfile

  1. get project id

project=($(docker run --rm --link openrefine-server -v ${workingdir}:/data felixlohmeier/openrefine-client --list | cut -c 2-14))

  1. apply transformations from json file

docker run --rm --link openrefine-server -v ${workingdir}:/data felixlohmeier/openrefine-client --apply ${jsonfile} ${project}

  1. export project to file

docker run --rm --link openrefine-server -v ${workingdir}:/data felixlohmeier/openrefine-client --export --output=${project}.tsv ${project}

  1. cleanup

docker stop -t=500 openrefine-server && docker rm openrefine-server