openrefine-client/README.md

85 lines
3.9 KiB
Markdown

# OpenRefine Python Client with extended command line interface
The [OpenRefine Python Client Library from PaulMakepeace](https://github.com/PaulMakepeace/refine-client-py) provides an interface to communicating with an [OpenRefine](http://openrefine.org) server. This fork extends the command line interface (CLI) and supports communication between docker containers.
## Download
One-file-executables:
* Linux: [openrefine-client_0-3-4_linux-64bit](https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.4/openrefine-client_0-3-4_linux-64bit) (4,7 MB)
* Windows: [openrefine-client_0-3-4_windows.exe](https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.4/openrefine-client_0-3-4_windows.exe) (4,9 MB)
* Mac: [openrefine-client_0-3-4_mac](https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.4/openrefine-client_0-3-4_mac) (4,4 MB)
For native Python installation on Windows, Mac or Linux see [Installation](#installation) below.
## Peek
A short video loop that demonstrates the basic features (list, create, apply, export)
![video loop that demonstrates basic features](openrefine-client-peek.gif)
## Usage
Command line interface:
- list all projects: `--list`
- create project from file: `--create [FILE]`
- apply [rules from json file](http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html): `--apply [FILE.json] [PROJECTID/PROJECTNAME]`
- export project to file: `--export [PROJECTID/PROJECTNAME] --output=FILE.tsv`
- templating export: `--export "My Address Book" --template='{ "friend" : {{jsonize(cells["friend"].value)}}, "address" : {{jsonize(cells["address"].value)}} }' --prefix='{ "address" : [' --rowSeparator ',' --suffix '] }' --filterQuery="^mary$"`
- show project metadata: `--info [PROJECTID/PROJECTNAME]`
- delete project: `--delete [PROJECTID/PROJECTNAME]`
- check `--help` for further options...
If you are familiar with python you may try all functions interactively (`python -i refine.py`) or use this library in your own python scripts. Some Examples:
* show version of OpenRefine server: `refine.RefineServer().get_version()`
* show total rows of project 2151545447855: `refine.RefineProject(refine.RefineServer(),'2151545447855').do_json('get-rows')['total']`
* compute clusters of project 2151545447855 and column key: `refine.RefineProject(refine.RefineServer(),'2151545447855').compute_clusters('key')`
## Configuration
By default the OpenRefine server URL is [http://127.0.0.1:3333](http://127.0.0.1:3333)
The environment variables `OPENREFINE_HOST` and `OPENREFINE_PORT` enable overriding the host & port as well as the command line options `-H` and `-P`.
## Installation
Install dependencies, which currently is `urllib2_file`:
```
sudo pip install -r requirements.txt
```
Ensure you have a Refine server running somewhere and, if necessary, set the environment vars as above.
Run tests, build, and install:
```
python setup.py test # to do a subset, e.g., --test-suite tests.test_facet
python setup.py build
python setup.py install
```
There is a Makefile that will do this too, and more.
## Credits
[Paul Makepeace](http://paulm.com), author
David Huynh, [initial cut](<http://markmail.org/message/jsxzlcu3gn6drtb7)
[Artfinder](http://www.artfinder.com), inspiration
[Felix Lohmeier](https://felixlohmeier.de), extended the CLI features
Some data used in the test suite has been used from publicly available sources,
- louisiana-elected-officials.csv: from http://www.sos.louisiana.gov/tabid/136/Default.aspx
- us_economic_assistance.csv: ["The Green Book"](http://www.data.gov/raw/1554)
- eli-lilly.csv: [ProPublica's "Docs for Dollars](http://projects.propublica.org/docdollars) leading to a [Lilly Faculty PDF](http://www.lillyfacultyregistry.com/documents/EliLillyFacultyRegistryQ22010.pdf) processed by [David Huynh's ScraperWiki script](http://scraperwiki.com/scrapers/eli-lilly-dollars-for-docs-scraper/edit/)