mirror of
https://github.com/opencultureconsulting/openrefine-client.git
synced 2025-03-30 00:00:46 +01:00
Compare commits
11 Commits
Author | SHA1 | Date | |
---|---|---|---|
![]() |
02cf1192c4 | ||
![]() |
e4d52818fc | ||
![]() |
16541c522e | ||
![]() |
965c4e97fd | ||
![]() |
0563b54fc6 | ||
![]() |
fa3e352879 | ||
![]() |
1dd0cafd4e | ||
![]() |
a368147bdf | ||
![]() |
f66c88ee35 | ||
![]() |
2735db3f3f | ||
![]() |
0df6050deb |
3
.gitignore
vendored
3
.gitignore
vendored
@ -4,7 +4,6 @@ dist
|
||||
.*
|
||||
openrefine_client.egg-info
|
||||
refine.spec
|
||||
openrefine-2.*
|
||||
openrefine-3.*
|
||||
openrefine-*
|
||||
openrefine-client_*
|
||||
tests-cli.log
|
||||
|
44
README.md
44
README.md
@ -1,12 +1,12 @@
|
||||
# OpenRefine Python Client with extended command line interface
|
||||
# OpenRefine Python Client with extended command line interface (⌨️ for 💎)
|
||||
|
||||
[](https://www.codacy.com/app/felixlohmeier/openrefine-client?utm_source=github.com&utm_medium=referral&utm_content=opencultureconsulting/openrefine-client&utm_campaign=Badge_Grade) [](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [](https://pypi.org/project/openrefine-client/) [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master)
|
||||
[](https://www.codacy.com/gh/opencultureconsulting/openrefine-client/dashboard) [](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [](https://pypi.org/project/openrefine-client/) [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master)
|
||||
|
||||
The [OpenRefine Python Client from PaulMakepeace](https://github.com/PaulMakepeace/refine-client-py) provides a library for communicating with an [OpenRefine](http://openrefine.org) server.
|
||||
This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, macOS).
|
||||
It is also available via Docker Hub, PyPI and Binder.
|
||||
|
||||
works with OpenRefine 2.7, 2.8, 3.0, 3.1, 3.2, 3.3, 3.4, 3.4.1
|
||||
works with OpenRefine 2.7, 2.8, 3.0, 3.1, 3.2, 3.3, 3.4, 3.4.1, 3.5.0
|
||||
|
||||
## Download
|
||||
|
||||
@ -248,7 +248,7 @@ openrefine-client --create combined.zip --format csv --projectName myproject --i
|
||||
### See also
|
||||
|
||||
- Linux Bash script to run OpenRefine in batch mode (import, transform, export): [openrefine-batch](https://github.com/opencultureconsulting/openrefine-batch)
|
||||
- [Jupyter notebook demonstrating usage in Linux Bash](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/openrefine-client-bash.ipynb)
|
||||
- [Jupyter notebook demonstrating usage in Linux Bash](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/notebooks/openrefine-client-bash.ipynb)
|
||||
- Use case [HOS-MetadataTransformations](https://github.com/subhh/HOS-MetadataTransformations): Automated workflow for harvesting, transforming and indexing of metadata using metha, OpenRefine and Solr. Part of the Hamburg Open Science "Schaufenster" software stack.
|
||||
- Use case [Data processing of ILS data to facilitate a new discovery layer for the German Literature Archive (DLA)](https://doi.org/10.5281/zenodo.2678113): Custom data processing pipeline based on Pandas (a Python library) and OpenRefine.
|
||||
|
||||
@ -297,7 +297,7 @@ Run openrefine-client linked to a dockerized OpenRefine ([felixlohmeier/openrefi
|
||||
2. Run server (will be available at http://localhost:3333)
|
||||
|
||||
```sh
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.4.1
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.5.0
|
||||
```
|
||||
|
||||
3. Run client with some [basic commands](#basic-commands): 1. download example files, 2. create project from file, 3. list projects, 4. show metadata, 5. export to terminal, 6. apply transformation rules (deduplication), 7. export again to terminal, 8. export to xls file and 9. delete project
|
||||
@ -337,7 +337,7 @@ Customize OpenRefine server:
|
||||
- Example for [allocating more memory](https://github.com/OpenRefine/OpenRefine/wiki/FAQ#out-of-memory-errors---feels-slow---could-not-reserve-enough-space-for-object-heap) to OpenRefine with additional option `-m 4G`
|
||||
|
||||
```sh
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.4.1 -i 0.0.0.0 -d /data -m 4G
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.5.0 -i 0.0.0.0 -d /data -m 4G
|
||||
```
|
||||
|
||||
- The OpenRefine version is defined by the docker tag.
|
||||
@ -624,8 +624,8 @@ See also:
|
||||
- free to use on-demand server with Jupyter notebook, OpenRefine and Bash
|
||||
- no registration needed, will start within a few minutes
|
||||
- [restricted](https://mybinder.readthedocs.io/en/latest/faq.html#how-much-memory-am-i-given-when-using-binder) to 2 GB RAM and server will be deleted after 10 minutes of inactivity
|
||||
- [bash_kernel demo notebook](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/openrefine-client-bash.ipynb) for using the openrefine-client in a Linux Bash environment [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master?urlpath=/tree/openrefine-client-bash.ipynb)
|
||||
- [python2 demo notebook](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/openrefine-client-python.ipynb) for using the openrefine-client in a Python 2 environment [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master?urlpath=/tree/openrefine-client-python.ipynb)
|
||||
- [bash_kernel demo notebook](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/notebooks/openrefine-client-bash.ipynb) for using the openrefine-client in a Linux Bash environment [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master?urlpath=/tree/notebooks/openrefine-client-bash.ipynb)
|
||||
- [python2 demo notebook](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/notebooks/openrefine-client-python.ipynb) for using the openrefine-client in a Python 2 environment [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master?urlpath=/tree/notebooks/openrefine-client-python.ipynb)
|
||||
|
||||
## Development
|
||||
|
||||
@ -651,42 +651,42 @@ The Python client library includes several unit tests.
|
||||
|
||||
There is also a script that uses docker images to run the unit tests with different versions of OpenRefine.
|
||||
|
||||
- run tests on all OpenRefine versions (from 2.0 up to 3.4.1)
|
||||
- run tests on all OpenRefine versions (from 2.0 up to 3.5.0)
|
||||
|
||||
```sh
|
||||
./tests.sh -a
|
||||
```
|
||||
|
||||
- run tests on tag 3.4.1
|
||||
- run tests on tag 3.5.0
|
||||
|
||||
```sh
|
||||
./tests.sh -t 3.4.1
|
||||
./tests.sh -t 3.5.0
|
||||
```
|
||||
|
||||
- run tests on tag 3.4.1 interactively (pause before and after tests)
|
||||
- run tests on tag 3.5.0 interactively (pause before and after tests)
|
||||
|
||||
```sh
|
||||
./tests.sh -t 3.4.1 -i
|
||||
./tests.sh -t 3.5.0 -i
|
||||
```
|
||||
|
||||
- run tests on tags 3.4.1 and 2.7
|
||||
- run tests on tags 3.5.0 and 2.7
|
||||
|
||||
```sh
|
||||
./tests.sh -t 3.4.1 -t 2.7
|
||||
./tests.sh -t 3.5.0 -t 2.7
|
||||
```
|
||||
|
||||
For Linux there are also functional tests for all command line options.
|
||||
|
||||
- run all functional tests on OpenRefine 3.4
|
||||
- run all functional tests on OpenRefine 3.5.0
|
||||
|
||||
```sh
|
||||
./tests-cli.sh 3.4.1
|
||||
./tests-cli.sh 3.5.0
|
||||
```
|
||||
|
||||
- run all functional tests on OpenRefine 3.4 with one-file-executable
|
||||
- run all functional tests on OpenRefine 3.5.0 with one-file-executable
|
||||
|
||||
```sh
|
||||
./tests-cli.sh 3.4.1 openrefine-client_0-3-7_linux
|
||||
./tests-cli.sh 3.5.0 openrefine-client_0-3-7_linux
|
||||
```
|
||||
|
||||
### Distributing
|
||||
@ -696,7 +696,7 @@ Note to myself: When releasing a new version...
|
||||
1. Run functional tests
|
||||
|
||||
```sh
|
||||
for v in 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.4.1; do
|
||||
for v in 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.4.1 3.5.0; do
|
||||
./tests-cli.sh $v
|
||||
done
|
||||
```
|
||||
@ -728,7 +728,7 @@ Note to myself: When releasing a new version...
|
||||
4. Run functional tests with Linux executable
|
||||
|
||||
```sh
|
||||
for v in 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.4.1; do
|
||||
for v in 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.4.1 3.5.0; do
|
||||
./tests-cli.sh $v openrefine-client_0-3-7_linux
|
||||
done
|
||||
```
|
||||
@ -752,7 +752,7 @@ Note to myself: When releasing a new version...
|
||||
8. Bump openrefine-client version in related projects
|
||||
|
||||
- openrefine-batch: [openrefine-batch.sh](https://github.com/opencultureconsulting/openrefine-batch/blob/master/openrefine-batch.sh#L7) and [openrefine-batch-docker.sh](https://github.com/opencultureconsulting/openrefine-batch/blob/master/openrefine-batch-docker.sh)
|
||||
- openrefineder: [postBuild](https://github.com/felixlohmeier/openrefineder/blob/master/postBuild)
|
||||
- openrefineder: [postBuild](https://github.com/felixlohmeier/openrefineder/blob/master/binder/postBuild)
|
||||
|
||||
## Credits
|
||||
|
||||
|
@ -97,7 +97,8 @@ class RefineServer(object):
|
||||
try:
|
||||
response = urllib2.urlopen(req)
|
||||
except urllib2.HTTPError as e:
|
||||
raise Exception('HTTP %d "%s" for %s\n\t%s' % (e.code, e.msg, e.geturl(), data))
|
||||
raise Exception('HTTP %d "%s" for %s\n\t%s' %
|
||||
(e.code, e.msg, e.geturl(), data))
|
||||
except urllib2.URLError as e:
|
||||
raise urllib2.URLError(
|
||||
'%s for %s. No Refine server reachable/running; ENV set?' %
|
||||
@ -113,6 +114,10 @@ class RefineServer(object):
|
||||
"""Open a Refine URL, optionally POST data, and return parsed JSON."""
|
||||
response = json.loads(self.urlopen(*args, **kwargs).read())
|
||||
if 'code' in response and response['code'] not in ('ok', 'pending'):
|
||||
if 'Missing or invalid csrf_token parameter' == response['message']:
|
||||
self.get_csrf_token()
|
||||
response = json.loads(self.urlopen(*args, **kwargs).read())
|
||||
return response
|
||||
error_message = ('server ' + response['code'] + ': ' +
|
||||
response.get('message', response.get('stack', response)))
|
||||
raise Exception(error_message)
|
||||
@ -413,7 +418,10 @@ class RefineProject:
|
||||
for i, column in enumerate(column_model['columns']):
|
||||
name = column['name']
|
||||
self.column_order[name] = i
|
||||
column_index[name] = column['cellIndex']
|
||||
try:
|
||||
column_index[name] = column['cellIndex']
|
||||
except KeyError:
|
||||
column_index[name] = i
|
||||
self.key_column = column_model['keyColumnName']
|
||||
self.has_records = response['recordModel'].get('hasRecords', False)
|
||||
self.rows_response_factory = RowsResponseFactory(column_index)
|
||||
|
12
tests.sh
12
tests.sh
@ -17,8 +17,8 @@
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
# defaults:
|
||||
all=(3.4.1 3.4 3.3 3.2-java12 3.2-java11 3.2-java10 3.2-java9 3.2 3.1-java9 3.1 3.0-java9 3.0 2.8-java9 2.8 2.8-java7 2.7 2.7-java7 2.5-java7 2.5-java6 2.1-java6 2.0-java6)
|
||||
main=(3.4.1 3.4 3.3 3.2 3.1 3.0 2.8 2.7 2.5-java6 2.1-java6 2.0-java6)
|
||||
all=(3.5.0 3.4.1 3.4 3.3 3.2-java12 3.2-java11 3.2-java10 3.2-java9 3.2 3.1-java9 3.1 3.0-java9 3.0 2.8-java9 2.8 2.8-java7 2.7 2.7-java7 2.5-java7 2.5-java6 2.1-java6 2.0-java6)
|
||||
main=(3.5.0 3.4.1 3.4 3.3 3.2 3.1 3.0 2.8 2.7 2.5-java6 2.1-java6 2.0-java6)
|
||||
interactively=false
|
||||
port="3333"
|
||||
|
||||
@ -31,10 +31,10 @@ Script for running tests with different OpenRefine and Java versions.
|
||||
It uses docker images from https://hub.docker.com/r/felixlohmeier/openrefine.
|
||||
|
||||
Examples:
|
||||
./tests.sh -a # run tests on all OpenRefine versions (from 2.0 up to 3.4.1)
|
||||
./tests.sh -t 3.4.1 # run tests on tag 3.4.1
|
||||
./tests.sh -t 3.4.1 -i # run tests on tag 3.4.1 interactively (pause before and after tests)
|
||||
./tests.sh -t 3.4.1 -t 2.7 # run tests on tags 3.4.1 and 2.7
|
||||
./tests.sh -a # run tests on all OpenRefine versions (from 2.0 up to 3.5.0)
|
||||
./tests.sh -t 3.5.0 # run tests on tag 3.5.0
|
||||
./tests.sh -t 3.5.0 -i # run tests on tag 3.5.0 interactively (pause before and after tests)
|
||||
./tests.sh -t 3.5.0 -t 2.7 # run tests on tags 3.5.0 and 2.7
|
||||
|
||||
Advanced:
|
||||
./tests.sh -j # run tests on all OpenRefine versions and each with all supported Java versions (requires a lot of docker images to be downloaded!)
|
||||
|
@ -32,7 +32,8 @@ DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --processQuotes "false"
|
||||
# OpenRefine 4.x fails without manually set headerLines
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --processQuotes "false" --headerLines 1
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
@ -38,8 +38,13 @@ DATA
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-removal",
|
||||
"columnName": "record"
|
||||
"op": "core/column-reorder",
|
||||
"columnNames": [
|
||||
"record - a",
|
||||
"record - b",
|
||||
"record - c"
|
||||
],
|
||||
"description": "Reorder columns"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
|
@ -38,12 +38,13 @@ DATA
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-removal",
|
||||
"columnName": "root"
|
||||
},
|
||||
{
|
||||
"op": "core/column-removal",
|
||||
"columnName": "root - record"
|
||||
"op": "core/column-reorder",
|
||||
"columnNames": [
|
||||
"root - record - icon",
|
||||
"root - record - code",
|
||||
"root - record - meaning"
|
||||
],
|
||||
"description": "Reorder columns"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
|
@ -38,12 +38,13 @@ DATA
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-removal",
|
||||
"columnName": "root"
|
||||
},
|
||||
{
|
||||
"op": "core/column-removal",
|
||||
"columnName": "root - record"
|
||||
"op": "core/column-reorder",
|
||||
"columnNames": [
|
||||
"root - record - a",
|
||||
"root - record - b",
|
||||
"root - record - c"
|
||||
],
|
||||
"description": "Reorder columns"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
|
Loading…
x
Reference in New Issue
Block a user