2011-04-27 19:53:17 +02:00
===================================
2013-10-10 13:41:10 +02:00
OpenRefine Python Client Library
2011-04-27 19:53:17 +02:00
===================================
2013-10-10 13:41:10 +02:00
The OpenRefine Python Client Library provides an interface to
communicating with an `OpenRefine <http://openrefine.org/> `_ server.
2011-04-27 19:53:17 +02:00
2017-11-19 23:29:45 +01:00
Usage
=====
2017-02-02 01:09:13 +01:00
2017-11-19 23:29:45 +01:00
Command line interface:
2017-02-02 01:09:13 +01:00
2017-11-19 23:29:45 +01:00
- list all projects: `` python refine.py --list ``
- create project from file: `` python refine.py --create [FILE] ``
- apply `rules from json file <http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html> `_ : `` python refine.py --apply [FILE.json] [PROJECTID/PROJECTNAME] ``
- export project to file: `` python refine.py --export [PROJECTID/PROJECTNAME] --output=FILE.tsv ``
- show project metadata: `` python refine.py --info [PROJECTID/PROJECTNAME] ``
- delete project: `` python refine.py --delete [PROJECTID/PROJECTNAME] ``
- check `` python refine.py --help `` for further options...
2017-02-02 12:23:16 +01:00
2017-11-19 23:29:45 +01:00
If you are familiar with python you may try all functions interactively (`` python -i refine.py `` ) or use this library in your own python scripts. Some Examples:
2017-02-02 12:23:16 +01:00
2017-11-19 23:29:45 +01:00
* show version of OpenRefine server: `` refine.RefineServer().get_version() ``
* show total rows of project 2151545447855: `` refine.RefineProject(refine.RefineServer(),'2151545447855').do_json('get-rows')['total'] ``
* compute clusters of project 2151545447855 and column key: `` refine.RefineProject(refine.RefineServer(),'2151545447855').compute_clusters('key') ``
Features
=============
2017-02-02 12:23:16 +01:00
2011-04-27 19:53:17 +02:00
Currently, the following API is supported:
- project creation/import, deletion, export
- facet computation
- text
- text filter
- numeric
- blank
- starred & flagged
- ... extensible class
- 'engine': managing multiple facets and their computation results
- sorting & reordering
- clustering
- transforms
- transposes
- single and mass edits
- annotation (star/flag)
- column
- move
- add
- split
- rename
- reorder
- remove
2011-06-10 16:42:18 +02:00
- reconciliation
- reconciliation judgment facet
- guessing column type
- querying reconciliation services preferences
- perform reconciliation
2011-04-27 19:53:17 +02:00
Configuration
=============
2013-10-10 13:41:10 +02:00
By default the OpenRefine server URL is http://127.0.0.1:3333
The environment variables `` OPENREFINE_HOST `` and `` OPENREFINE_PORT ``
2011-04-27 19:53:17 +02:00
enable overriding the host & port.
2011-04-28 12:59:35 +02:00
In order to run all tests, a live Refine server is needed. No existing projects
are affected.
2011-04-27 19:53:17 +02:00
Installation
============
2011-04-28 17:49:17 +02:00
(Someone with more familiarity with python's byzantine collection of installation
frameworks is very welcome to improve/"best practice" all this.)
2011-04-27 19:53:17 +02:00
2017-11-17 16:47:31 +01:00
#. Install dependencies, which currently is `` urllib2_file `` :
`` sudo pip install -r requirements.txt ``
(If you don't have `` pip `` visit `pip-installer.org <http://www.pip-installer.org/en/latest/installing.html#install-or-upgrade-pip> `_ )
2011-04-28 17:49:17 +02:00
#. Ensure you have a Refine server running somewhere and, if necessary, set
2013-10-09 20:08:47 +02:00
the environment vars as above.
2011-04-28 17:49:17 +02:00
#. Run tests, build, and install:
`` python setup.py test # to do a subset, e.g., --test-suite tests.test_facet ``
`` python setup.py build ``
`` python setup.py install ``
2011-05-01 19:49:11 +02:00
2011-04-28 17:49:17 +02:00
There is a Makefile that will do this too, and more.
2011-04-27 19:53:17 +02:00
TODO
====
The API so far has been filled out from building a test suite to carry out the
2011-04-28 12:59:35 +02:00
actions in `David Huynh's Refine tutorial <http://davidhuynh.net/spaces/nicar2011/tutorial.pdf> `_ which while certainly showing off a
wide range of Refine features doesn't cover the entire suite. Notable exceptions
currently include:
2011-04-27 19:53:17 +02:00
2011-06-10 16:42:18 +02:00
- reconciliation support is useful but not complete
2011-04-27 19:53:17 +02:00
- undo/redo
- Freebase
- join columns
- columns from URL
2011-05-13 01:38:21 +02:00
Contribute
============
2013-10-09 20:08:47 +02:00
Pull requests with passing tests welcome! Source is at https://github.com/PaulMakepeace/refine-client-py
2011-05-13 01:38:21 +02:00
Useful Tools
------------
One aspect of development is watching HTTP transactions. To that end, I found
`Fiddler <http://www.fiddler2.com/> `_ on Windows and `HTTPScoop
<http://www.tuffcode.com/> `_ invaluable. The latter won't URL-decode nor nicely
format JSON but the `Online JavaScript Beautifier <http://jsbeautifier.org/> `_
will.
2017-11-19 23:29:45 +01:00
Executables may be built with `pyinstaller <http://www.pyinstaller.org> `_ .
2013-10-10 13:41:10 +02:00
History
=======
OpenRefine used to be called Google Refine, and this library used to be called
the Google Refine Python Client Library.
2011-04-27 19:53:17 +02:00
Credits
=======
2011-05-01 19:07:27 +02:00
Paul Makepeace, author, <paulm@paulm.com>
2011-04-27 19:53:17 +02:00
2013-10-10 13:43:16 +02:00
David Huynh, `initial cut <http://markmail.org/message/jsxzlcu3gn6drtb7> `_
2011-04-28 12:59:35 +02:00
2011-04-28 17:57:29 +02:00
`Artfinder <http://www.artfinder.com/> `_ , inspiration
2011-04-28 12:59:35 +02:00
Some data used in the test suite has been used from publicly available sources,
2011-05-13 01:38:21 +02:00
- louisiana-elected-officials.csv: from
2011-05-13 01:59:20 +02:00
http://www.sos.louisiana.gov/tabid/136/Default.aspx
2011-04-28 12:59:35 +02:00
2011-05-13 01:38:21 +02:00
- us_economic_assistance.csv: `"The Green Book" <http://www.data.gov/raw/1554> `_
2011-04-28 12:59:35 +02:00
2011-05-13 01:38:21 +02:00
- eli-lilly.csv: `ProPublica's "Docs for Dollars" <http://projects.propublica.org/docdollars/> `_ leading to a `Lilly Faculty PDF <http://www.lillyfacultyregistry.com/documents/EliLillyFacultyRegistryQ22010.pdf> `_ processed by `David Huynh's ScraperWiki script <http://scraperwiki.com/scrapers/eli-lilly-dollars-for-docs-scraper/edit/> `_
2011-04-28 12:59:35 +02:00