This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Go to file
Felix Lohmeier 8f2ef1d3e0 fixed typo in Dockerfile 2017-03-14 22:23:20 +01:00
docker fixed typo in Dockerfile 2017-03-14 22:23:20 +01:00
google included urllib2_file.py in the package to ease installation 2017-03-14 22:04:06 +01:00
tests Explicitly insist on guessing cell value types (change in 2.6). 2013-10-14 00:30:24 +06:00
.gitignore new project name 2011-05-01 19:10:38 +00:00
COPYING.txt Apply GPL 2011-05-01 17:24:12 +00:00
MANIFEST.in Add MANIFEST.in for correct setup.py sdist upload behavior 2011-07-22 11:00:35 +00:00
Makefile Remove README.txt too 2013-10-10 16:42:29 +05:00
README.rst included urllib2_file.py in the package to ease installation 2017-03-14 22:04:06 +01:00
refine.py included urllib2_file.py in the package to ease installation 2017-03-14 22:04:06 +01:00
setup.py Google Refine -> OpenRefine 2013-10-10 16:41:10 +05:00

README.rst

===================================
OpenRefine Python Client Library
===================================

The OpenRefine Python Client Library provides an interface to
communicating with an `OpenRefine <http://openrefine.org/>`_ server.

If you are looking for a ready to use command line interface to OpenRefine for batch processing then you might be interested in the following bash shell script:
`felixlohmeier/openrefine-batch <https://github.com/felixlohmeier/openrefine-batch>`_

If you are familiar with python and want to go into more depth, then read on!

Features
=============

Command line interface:

- list projects: refine.py --list
- create project from file: refine.py --create [FILE]
- apply `rules from json file <http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html>`_: refine.py --apply [FILE.json] [PROJECTID]
- export project to file: refine.py --export [PROJECTID] --output=FILE.tsv

Currently, the following API is supported:

- project creation/import, deletion, export
- facet computation

  - text
  - text filter
  - numeric
  - blank
  - starred & flagged
  - ... extensible class

- 'engine': managing multiple facets and their computation results
- sorting & reordering
- clustering
- transforms
- transposes
- single and mass edits
- annotation (star/flag)
- column

  - move
  - add
  - split
  - rename
  - reorder
  - remove

- reconciliation

  - reconciliation judgment facet
  - guessing column type
  - querying reconciliation services preferences
  - perform reconciliation

Configuration
=============

By default the OpenRefine server URL is http://127.0.0.1:3333
The environment variables ``OPENREFINE_HOST`` and ``OPENREFINE_PORT``
enable overriding the host & port.

In order to run all tests, a live Refine server is needed. No existing projects
are affected.

Installation
============

(Someone with more familiarity with python's byzantine collection of installation
frameworks is very welcome to improve/"best practice" all this.)

#. Ensure you have a Refine server running somewhere and, if necessary, set
   the environment vars as above.

#. Run tests, build, and install:

   ``python setup.py test # to do a subset, e.g., --test-suite tests.test_facet``

   ``python setup.py build``

   ``python setup.py install``

There is a Makefile that will do this too, and more.

TODO
====

The API so far has been filled out from building a test suite to carry out the
actions in `David Huynh's Refine tutorial <http://davidhuynh.net/spaces/nicar2011/tutorial.pdf>`_ which while certainly showing off a
wide range of Refine features doesn't cover the entire suite. Notable exceptions
currently include:

- reconciliation support is useful but not complete
- undo/redo
- Freebase
- join columns
- columns from URL

Contribute
============

Pull requests with passing tests welcome! Source is at https://github.com/PaulMakepeace/refine-client-py

Useful Tools
------------

One aspect of development is watching HTTP transactions. To that end, I found
`Fiddler <http://www.fiddler2.com/>`_ on Windows and `HTTPScoop
<http://www.tuffcode.com/>`_ invaluable. The latter won't URL-decode nor nicely
format JSON but the `Online JavaScript Beautifier <http://jsbeautifier.org/>`_
will.

History
=======

OpenRefine used to be called Google Refine, and this library used to be called
the Google Refine Python Client Library.

Credits
=======

Paul Makepeace, author, <paulm@paulm.com>

David Huynh, `initial cut <http://markmail.org/message/jsxzlcu3gn6drtb7>`_

`Artfinder <http://www.artfinder.com/>`_, inspiration

Some data used in the test suite has been used from publicly available sources,

- louisiana-elected-officials.csv: from
  http://www.sos.louisiana.gov/tabid/136/Default.aspx

- us_economic_assistance.csv: `"The Green Book" <http://www.data.gov/raw/1554>`_

- eli-lilly.csv: `ProPublica's "Docs for Dollars" <http://projects.propublica.org/docdollars/>`_ leading to a `Lilly Faculty PDF <http://www.lillyfacultyregistry.com/documents/EliLillyFacultyRegistryQ22010.pdf>`_ processed by `David Huynh's ScraperWiki script <http://scraperwiki.com/scrapers/eli-lilly-dollars-for-docs-scraper/edit/>`_