added code highlighting and improved pip install command

This commit is contained in:
Felix Lohmeier 2019-08-20 06:35:33 +02:00
parent 75e9a763d1
commit 2e6507bdf2
1 changed files with 85 additions and 80 deletions

165
README.md
View File

@ -32,19 +32,19 @@ To use the client:
- Windows: Open PowerShell and enter following command - Windows: Open PowerShell and enter following command
``` ```sh
cd ~\Downloads cd ~\Downloads
``` ```
- macOS: Open Terminal (Finder > Applications > Utilities > Terminal) and enter following command - macOS: Open Terminal (Finder > Applications > Utilities > Terminal) and enter following command
``` ```sh
cd ~/Downloads cd ~/Downloads
``` ```
- Linux: Open terminal app (Terminal, Konsole, xterm, ...) and enter following command - Linux: Open terminal app (Terminal, Konsole, xterm, ...) and enter following command
``` ```sh
cd ~/Downloads cd ~/Downloads
``` ```
@ -54,13 +54,13 @@ To use the client:
- macOS: - macOS:
``` ```sh
chmod +x openrefine-client_0-3-7_macos chmod +x openrefine-client_0-3-7_macos
``` ```
- Linux: - Linux:
``` ```sh
chmod +x openrefine-client_0-3-7_linux chmod +x openrefine-client_0-3-7_linux
``` ```
@ -68,19 +68,19 @@ To use the client:
- Windows: - Windows:
``` ```sh
.\openrefine-client_0-3-7_windows.exe .\openrefine-client_0-3-7_windows.exe
``` ```
- macOS: - macOS:
``` ```sh
./openrefine-client_0-3-7_macos ./openrefine-client_0-3-7_macos
``` ```
- Linux: - Linux:
``` ```sh
./openrefine-client_0-3-7_linux ./openrefine-client_0-3-7_linux
``` ```
@ -99,7 +99,7 @@ Download example data (`--download`) and create project from file (`--create`):
- Windows: - Windows:
``` ```sh
.\openrefine-client_0-3-7_windows.exe --download "https://git.io/fj5hF" --output=duplicates.csv .\openrefine-client_0-3-7_windows.exe --download "https://git.io/fj5hF" --output=duplicates.csv
.\openrefine-client_0-3-7_windows.exe --download "https://git.io/fj5ju" --output=duplicates-deletion.json .\openrefine-client_0-3-7_windows.exe --download "https://git.io/fj5ju" --output=duplicates-deletion.json
.\openrefine-client_0-3-7_windows.exe --create duplicates.csv .\openrefine-client_0-3-7_windows.exe --create duplicates.csv
@ -107,7 +107,7 @@ Download example data (`--download`) and create project from file (`--create`):
- macOS: - macOS:
``` ```sh
./openrefine-client_0-3-7_macos --download "https://git.io/fj5hF" --output=duplicates.csv ./openrefine-client_0-3-7_macos --download "https://git.io/fj5hF" --output=duplicates.csv
./openrefine-client_0-3-7_macos --download "https://git.io/fj5ju" --output=duplicates-deletion.json ./openrefine-client_0-3-7_macos --download "https://git.io/fj5ju" --output=duplicates-deletion.json
./openrefine-client_0-3-7_macos --create duplicates.csv ./openrefine-client_0-3-7_macos --create duplicates.csv
@ -115,7 +115,7 @@ Download example data (`--download`) and create project from file (`--create`):
- Linux: - Linux:
``` ```sh
./openrefine-client_0-3-7_linux --download "https://git.io/fj5hF" --output=duplicates.csv ./openrefine-client_0-3-7_linux --download "https://git.io/fj5hF" --output=duplicates.csv
./openrefine-client_0-3-7_linux --download "https://git.io/fj5ju" --output=duplicates-deletion.json ./openrefine-client_0-3-7_linux --download "https://git.io/fj5ju" --output=duplicates-deletion.json
./openrefine-client_0-3-7_linux --create duplicates.csv ./openrefine-client_0-3-7_linux --create duplicates.csv
@ -161,7 +161,7 @@ It even provides an additional feature for splitting results into multiple files
To try out the functionality create another project from the example file above. To try out the functionality create another project from the example file above.
``` ```sh
--create duplicates.csv --projectName=advanced --create duplicates.csv --projectName=advanced
``` ```
@ -173,7 +173,7 @@ The following example code will export...
macOS/Linux Terminal (multi-line input with `\` ): macOS/Linux Terminal (multi-line input with `\` ):
``` ```sh
"advanced" \ "advanced" \
--prefix='{ "events" : [ --prefix='{ "events" : [
' \ ' \
@ -188,7 +188,7 @@ macOS/Linux Terminal (multi-line input with `\` ):
Windows PowerShell (multi-line input with `` ` ``; quotes needs to be doubled): Windows PowerShell (multi-line input with `` ` ``; quotes needs to be doubled):
``` ```sh
"advanced" ` "advanced" `
--prefix='{ ""events"" : [ --prefix='{ ""events"" : [
' ` ' `
@ -204,14 +204,14 @@ Windows PowerShell (multi-line input with `` ` ``; quotes needs to be doubled):
Add the following options to the last command (recall with `↑`) to store the results in multiple files. Add the following options to the last command (recall with `↑`) to store the results in multiple files.
Each file will contain the prefix, an processed row, and the suffix. Each file will contain the prefix, an processed row, and the suffix.
``` ```sh
--output=advanced.json --splitToFiles=true --output=advanced.json --splitToFiles=true
``` ```
Filenames are suffixed with the row number by default (e.g. `advanced_1.json`, `advanced_2.json` etc.). Filenames are suffixed with the row number by default (e.g. `advanced_1.json`, `advanced_2.json` etc.).
There is another option to use the value in the first column instead: There is another option to use the value in the first column instead:
``` ```sh
--output=advanced.json --splitToFiles=true --suffixById=true --output=advanced.json --splitToFiles=true --suffixById=true
``` ```
@ -229,7 +229,7 @@ When using this option, the first column should contain unique identifiers.
[felixlohmeier/openrefine-client](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [![Docker](https://img.shields.io/microbadger/image-size/felixlohmeier/openrefine-client?label=docker)](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [felixlohmeier/openrefine-client](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [![Docker](https://img.shields.io/microbadger/image-size/felixlohmeier/openrefine-client?label=docker)](https://hub.docker.com/r/felixlohmeier/openrefine-client/)
``` ```sh
docker pull felixlohmeier/openrefine-client:v0.3.7 docker pull felixlohmeier/openrefine-client:v0.3.7
``` ```
@ -237,7 +237,7 @@ docker pull felixlohmeier/openrefine-client:v0.3.7
Run client and mount current directory as workspace: Run client and mount current directory as workspace:
``` ```sh
docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7
``` ```
@ -245,13 +245,13 @@ The docker option `--network=host` allows you to connect to a local or remote Op
- list projects on default URL (http://localhost:3333) - list projects on default URL (http://localhost:3333)
``` ```sh
docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 --list docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 --list
``` ```
- list projects on a remote server (http://example.com) - list projects on a remote server (http://example.com)
``` ```sh
docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 -H example.com -P 80 --list docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 -H example.com -P 80 --list
``` ```
@ -263,19 +263,19 @@ Run openrefine-client linked to a dockerized OpenRefine ([felixlohmeier/openrefi
1. Create docker network 1. Create docker network
``` ```sh
docker network create openrefine docker network create openrefine
``` ```
2. Run server (will be available at http://localhost:3333) 2. Run server (will be available at http://localhost:3333)
``` ```sh
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.2 docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.2
``` ```
3. Run client with some [basic commands](#basic-commands): 1. download example files, 2. create project from file, 3. list projects, 4. show metadata, 5. export to terminal, 6. apply transformation rules (deduplication), 7. export again to terminal, 8. export to xls file and 9. delete project 3. Run client with some [basic commands](#basic-commands): 1. download example files, 2. create project from file, 3. list projects, 4. show metadata, 5. export to terminal, 6. apply transformation rules (deduplication), 7. export again to terminal, 8. export to xls file and 9. delete project
``` ```sh
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 --download "https://git.io/fj5hF" --output=duplicates.csv docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 --download "https://git.io/fj5hF" --output=duplicates.csv
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 --download "https://git.io/fj5ju" --output=duplicates-deletion.json docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 --download "https://git.io/fj5ju" --output=duplicates-deletion.json
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 -H openrefine-server --create duplicates.csv docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.7 -H openrefine-server --create duplicates.csv
@ -290,14 +290,14 @@ Run openrefine-client linked to a dockerized OpenRefine ([felixlohmeier/openrefi
4. Stop and delete server: 4. Stop and delete server:
``` ```sh
docker stop openrefine-server docker stop openrefine-server
docker rm openrefine-server docker rm openrefine-server
``` ```
5. Delete docker network: 5. Delete docker network:
``` ```sh
docker network rm openrefine docker network rm openrefine
``` ```
@ -309,7 +309,7 @@ Customize OpenRefine server:
- Example for [allocating more memory](https://github.com/OpenRefine/OpenRefine/wiki/FAQ#out-of-memory-errors---feels-slow---could-not-reserve-enough-space-for-object-heap) to OpenRefine with additional option `-m 4G` - Example for [allocating more memory](https://github.com/OpenRefine/OpenRefine/wiki/FAQ#out-of-memory-errors---feels-slow---could-not-reserve-enough-space-for-object-heap) to OpenRefine with additional option `-m 4G`
``` ```sh
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.2 -i 0.0.0.0 -d /data -m 4G docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.2 -i 0.0.0.0 -d /data -m 4G
``` ```
@ -317,13 +317,13 @@ Customize OpenRefine server:
Check the [DockerHub repository](https://hub.docker.com/r/felixlohmeier/openrefine) for available tags. Check the [DockerHub repository](https://hub.docker.com/r/felixlohmeier/openrefine) for available tags.
Example for OpenRefine `2.8` with same options as above: Example for OpenRefine `2.8` with same options as above:
``` ```sh
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:2.8 -i 0.0.0.0 -d /data -m 4G docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:2.8 -i 0.0.0.0 -d /data -m 4G
``` ```
- If you want OpenRefine to read and write persistent data in host directory (i.e. store projects) you can mount the container path `/data`. Example for host directory `/home/felix/refine`: - If you want OpenRefine to read and write persistent data in host directory (i.e. store projects) you can mount the container path `/data`. Example for host directory `/home/felix/refine`:
``` ```sh
docker run -d -p 3333:3333 -v /home/felix/refine:/data:z --network=openrefine name=openrefine-server felixlohmeier/openrefine:2.8 -i 0.0.0.0 -d /data -m 4G docker run -d -p 3333:3333 -v /home/felix/refine:/data:z --network=openrefine name=openrefine-server felixlohmeier/openrefine:2.8 -i 0.0.0.0 -d /data -m 4G
``` ```
@ -336,8 +336,8 @@ See also:
[openrefine-client](https://pypi.org/project/openrefine-client/) [![PyPI](https://img.shields.io/pypi/v/openrefine-client)](https://pypi.org/project/openrefine-client/) (requires Python 2.x) [openrefine-client](https://pypi.org/project/openrefine-client/) [![PyPI](https://img.shields.io/pypi/v/openrefine-client)](https://pypi.org/project/openrefine-client/) (requires Python 2.x)
``` ```sh
pip install openrefine-client python2 -m pip install openrefine-client --user
``` ```
This will install the package `openrefine-client` containing modules in `google.refine`. This will install the package `openrefine-client` containing modules in `google.refine`.
@ -346,7 +346,7 @@ A command line script `openrefine-client` will also be installed.
### Option 1: command line script ### Option 1: command line script
``` ```sh
openrefine-client --help openrefine-client --help
``` ```
@ -356,20 +356,20 @@ Usage: same commands as explained above (see [Basic Commands](#basic-commands) a
Import module cli: Import module cli:
``` ```python
from google.refine import cli from google.refine import cli
``` ```
Change URL (if necessary): Change URL (if necessary):
``` ```python
cli.refine.REFINE_HOST = 'localhost' cli.refine.REFINE_HOST = 'localhost'
cli.refine.REFINE_PORT = '3333' cli.refine.REFINE_PORT = '3333'
``` ```
Help screen: Help screen:
``` ```python
help(cli) help(cli)
``` ```
@ -377,59 +377,62 @@ Commands:
* download (e.g. example data): * download (e.g. example data):
``` ```python
cli.download('https://git.io/fj5hF','duplicates.csv') cli.download('https://git.io/fj5hF','duplicates.csv')
cli.download('https://git.io/fj5ju','duplicates-deletion.json') cli.download('https://git.io/fj5ju','duplicates-deletion.json')
``` ```
* list projects: * list projects:
``` ```python
cli.ls() cli.ls()
``` ```
* create project: * create project:
``` ```python
p1 = cli.create('duplicates.csv') p1 = cli.create('duplicates.csv')
``` ```
* show metadata: * show metadata:
``` ```python
cli.info(p1.project_id) cli.info(p1.project_id)
``` ```
* apply rules from file to project: * apply rules from file to project:
``` ```python
cli.apply(p1.project_id, 'duplicates-deletion.json') cli.apply(p1.project_id, 'duplicates-deletion.json')
``` ```
* export project to terminal: * export project to terminal:
``` ```python
cli.export(p1.project_id) cli.export(p1.project_id)
``` ```
* export project to file in xls format: * export project to file in xls format:
``` ```python
cli.export(p1.project_id, 'deduped.xls') cli.export(p1.project_id, 'deduped.xls')
``` ```
* export templating (see [Advanced Templating](#advanced-templating) above): * export templating (see [Advanced Templating](#advanced-templating) above):
``` ```python
cli.templating(p1.project_id, prefix='''{ "events" : [ cli.templating(
''', template=''' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }''', rowSeparator=''', p1.project_id,
''', suffix=''' prefix='''{ "events" : [
''',template=''' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }''',
rowSeparator=''',
''',suffix='''
] }''') ] }''')
``` ```
* delete project: * delete project:
``` ```python
cli.delete(p1.project_id) cli.delete(p1.project_id)
``` ```
@ -441,7 +444,7 @@ Some functions in the python client library are not yet compatible with OpenRefi
Import module refine: Import module refine:
``` ```python
from google.refine import refine from google.refine import refine
``` ```
@ -449,39 +452,39 @@ Server Commands:
* set up connection: * set up connection:
``` ```python
server1 = refine.Refine('http://localhost:3333') server1 = refine.Refine('http://localhost:3333')
``` ```
- show version: - show version:
``` ```python
server1.server.get_version() server1.server.get_version()
server1.server.version server1.server.version
``` ```
- list projects: - list projects:
``` ```python
server1.list_projects() server1.list_projects()
``` ```
- pretty print the returned dict with json.dumps: - pretty print the returned dict with json.dumps:
``` ```python
import json import json
print(json.dumps(server1.list_projects(), indent=1)) print(json.dumps(server1.list_projects(), indent=1))
``` ```
- create project (**function was edited in this fork**): - create project:
``` ```python
server1.new_project(project_file='duplicates.csv') server1.new_project(project_file='duplicates.csv')
``` ```
* create and open the returned project in one step: * create and open the returned project in one step:
``` ```python
project1 = server1.new_project(project_file='duplicates.csv') project1 = server1.new_project(project_file='duplicates.csv')
``` ```
@ -489,31 +492,31 @@ Project commands:
* open project: * open project:
``` ```python
project1 = server1.open_project('1234567890123') project1 = server1.open_project('1234567890123')
``` ```
* print full URL to project: * print full URL to project:
``` ```python
project1.project_url() project1.project_url()
``` ```
* list columns: * list columns:
``` ```python
project1.columns project1.columns
``` ```
* compute text facet on first column (**fails with OpenRefine >=3.2**): * compute text facet on first column (**fails with OpenRefine >=3.2**):
``` ```python
project1.compute_facets(facet.TextFacet(project1.columns[0])) project1.compute_facets(facet.TextFacet(project1.columns[0]))
``` ```
* print returned object * print returned object
``` ```python
facets = project1.compute_facets(facet.TextFacet(project1.columns[0])).facets[0] facets = project1.compute_facets(facet.TextFacet(project1.columns[0])).facets[0]
for k in sorted(facets.choices, key=lambda k: facets.choices[k].count, reverse=True): for k in sorted(facets.choices, key=lambda k: facets.choices[k].count, reverse=True):
print(facets.choices[k].count, k) print(facets.choices[k].count, k)
@ -521,60 +524,62 @@ Project commands:
* compute clusters on first column: * compute clusters on first column:
``` ```python
project1.compute_clusters(project1.columns[0]) project1.compute_clusters(project1.columns[0])
``` ```
* apply rules from file to project: * apply rules from file to project:
``` ```python
project1.apply_operations('duplicates-deletion.json') project1.apply_operations('duplicates-deletion.json')
``` ```
* export project: * export project:
``` ```python
project1.export(export_format='tsv') project1.export(export_format='tsv')
``` ```
* print the returned fileobject: * print the returned fileobject:
``` ```python
print(project1.export(export_format='tsv').read()) print(project1.export(export_format='tsv').read())
``` ```
* save the returned fileobject to file: * save the returned fileobject to file:
``` ```python
with open('export.tsv', 'wb') as f: with open('export.tsv', 'wb') as f:
f.write(project1.export(export_format='tsv').read()) f.write(project1.export(export_format='tsv').read())
``` ```
* templating export (**function was added in this fork**, see [Advanced Templating](#advanced-templating) above): * templating export (**function was added in this fork**, see [Advanced Templating](#advanced-templating) above):
``` ```python
data = project1.export_templating(prefix='''{ "events" : [ data = project1.export_templating(
''', template=''' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }''', rowSeparator=''', prefix='''{ "events" : [
''', suffix=''' ''',template=''' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }''',
rowSeparator=''',
''',suffix='''
] }''') ] }''')
print(data.read()) print(data.read())
``` ```
* print help screen with available commands (many more!): * print help screen with available commands (many more!):
``` ```python
help(project1) help(project1)
``` ```
* example for custom commands: * example for custom commands:
``` ```python
project1.do_json('get-rows')['total'] project1.do_json('get-rows')['total']
``` ```
* delete project: * delete project:
``` ```python
project1.delete() project1.delete()
``` ```
@ -606,13 +611,13 @@ The Python client library includes several unit tests.
- run all tests - run all tests
``` ```sh
python setup.py test python setup.py test
``` ```
- run subset test_facet - run subset test_facet
``` ```sh
python setup.py --test-suite tests.test_facet python setup.py --test-suite tests.test_facet
``` ```
@ -620,25 +625,25 @@ There is also a script that uses docker images to run the unit tests with differ
- run tests on all OpenRefine versions (from 2.0 up to 3.2) - run tests on all OpenRefine versions (from 2.0 up to 3.2)
``` ```sh
./tests.sh -a ./tests.sh -a
``` ```
- run tests on tag 3.2 - run tests on tag 3.2
``` ```sh
./tests.sh -t 3.2 ./tests.sh -t 3.2
``` ```
- run tests on tag 3.2 interactively (pause before and after tests) - run tests on tag 3.2 interactively (pause before and after tests)
``` ```sh
./tests.sh -t 3.2 -i ./tests.sh -t 3.2 -i
``` ```
- run tests on tags 3.2 and 2.7 - run tests on tags 3.2 and 2.7
``` ```sh
./tests.sh -t 3.2 -t 2.7 ./tests.sh -t 3.2 -t 2.7
``` ```
@ -648,7 +653,7 @@ Note to myself: When releasing a new version...
1. Run tests 1. Run tests
``` ```sh
./tests.sh -a ./tests.sh -a
``` ```
@ -667,7 +672,7 @@ Note to myself: When releasing a new version...
- One-file-executables will be available in `dist/`. - One-file-executables will be available in `dist/`.
``` ```sh
git clone https://github.com/opencultureconsulting/openrefine-client.git git clone https://github.com/opencultureconsulting/openrefine-client.git
cd openrefine-client cd openrefine-client
python -m pip install . --user python -m pip install . --user
@ -681,7 +686,7 @@ Note to myself: When releasing a new version...
5. Build package and upload to PyPI 5. Build package and upload to PyPI
``` ```sh
python3 setup.py sdist bdist_wheel python3 setup.py sdist bdist_wheel
python3 -m twine upload dist/* python3 -m twine upload dist/*
``` ```