bd73be52ea | ||
---|---|---|
.. | ||
Dockerfile | ||
README.md |
README.md
batch processing with python-client
There are some client libraries for OpenRefine that communicate with the OpenRefine API. I have prepared a docker container on top of the Python Library from PaulMakepeace and extended the CLI with some options to create new OpenRefine projects from files.
basic usage
-
start server:
docker run -d --name=openrefine-server felixlohmeier/openrefine
-
run client with one of the following commands:
- list projects:
docker run --rm --link openrefine-server felixlohmeier/openrefine-client --list
- create project from file:
docker run --rm --link openrefine-server felixlohmeier/openrefine-client --create [FILE] [PROJECTID]
- apply rules from json file:
docker run --rm --link openrefine-server felixlohmeier/openrefine-client --apply [FILE.json] [PROJECTID]
- export project to file:
docker run --rm --link openrefine-server felixlohmeier/openrefine-client --export [PROJECTID] --output=FILE.tsv
- check help screen for more options:
docker run --rm --link openrefine-server felixlohmeier/openrefine-client --help
- cleanup:
docker stop openrefine-server && docker rm openrefine-server
example for customized run commands in interactive mode (e.g. for usage in terminals)
- start server in terminal A:
docker run --rm --name=openrefine-server -p 80:3333 -v /home/felix/refine:/data:z felixlohmeier/openrefine -i 0.0.0.0 -m 4G -d /data
- automatically remove docker container when it exits
- set name "openrefine" for docker container
- publish internal port 3333 to host port 80
- mount host directory /home/felix/refine as working directory
- make openrefine available in the network
- increase java heap size to 4 GB
- set refine workspace to /data
- OpenRefine should be available at http://localhost
- start client in terminal B (prints help screen):
docker run --rm --link openrefine-server -v /home/felix/refine:/data:z felixlohmeier/openrefine-client
- automatically remove docker container when it exits
- build up network connection with docker container "openrefine"
- mount host directory /home/felix/refine as working directory
- apply history in file /home/felix/refine/history.json to project with id 1234567890123
example for customized run commands in detached mode (e.g. for usage in shell scripts)
- define variables (bring your own example data)
workingdir=/home/felix/refine inputfile=example.csv jsonfile=test.json
- start server
docker run -d --name=openrefine-server -v ${workingdir}:/data:z felixlohmeier/openrefine -i 0.0.0.0 -m 4G -d /data
- create project (import file)
docker run --rm --link openrefine-server -v ${workingdir}:/data:z felixlohmeier/openrefine-client --create $inputfile
- get project id
project=($(docker run --rm --link openrefine-server -v ${workingdir}:/data felixlohmeier/openrefine-client --list | cut -c 2-14))
- apply transformations from json file
docker run --rm --link openrefine-server -v ${workingdir}:/data felixlohmeier/openrefine-client --apply ${jsonfile} ${project}
- export project to file
docker run --rm --link openrefine-server -v ${workingdir}:/data felixlohmeier/openrefine-client --export --output=${project}.tsv ${project}
- cleanup
docker stop -t=500 openrefine-server && docker rm openrefine-server