2022-03-25 11:13:59 +01:00
|
|
|
# orcli (💎+🤖)
|
|
|
|
|
|
|
|
Bash script to control OpenRefine via [its HTTP API](https://docs.openrefine.org/technical-reference/openrefine-api).
|
|
|
|
|
|
|
|
## Features
|
|
|
|
|
2022-10-25 11:34:54 +02:00
|
|
|
* works with latest OpenRefine version (currently 3.6)
|
2022-10-25 12:41:13 +02:00
|
|
|
* run batch processes (import, transform, export)
|
2022-04-20 10:08:05 +02:00
|
|
|
* orcli takes care of starting and stopping OpenRefine with temporary workspaces
|
2022-10-25 12:41:13 +02:00
|
|
|
* allows execution of arbitrary bash scripts
|
|
|
|
* interactive mode for playing around and debugging
|
2022-04-20 10:08:05 +02:00
|
|
|
* your existing OpenRefine data will not be touched
|
2022-04-04 23:03:20 +02:00
|
|
|
* import CSV, ~~TSV, line-based TXT, fixed-width TXT, JSON or XML~~ (and specify input options)
|
2022-04-20 10:08:05 +02:00
|
|
|
* supports stdin, multiple files and URLs
|
|
|
|
* transform data by providing an [undo/redo](https://docs.openrefine.org/manual/running#history-undoredo) JSON file
|
|
|
|
* orcli calls specific endpoints for each operation to provide improved error handling and logging
|
|
|
|
* supports stdin, multiple files and URLs
|
2022-04-04 23:03:20 +02:00
|
|
|
* export to TSV, ~~CSV, HTML, XLS, XLSX, ODS~~
|
2022-04-04 23:00:37 +02:00
|
|
|
* ~~[templating export](https://docs.openrefine.org/manual/exporting#templating-exporter) to additional formats like JSON or XML~~
|
2022-03-25 11:13:59 +01:00
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
2022-04-20 10:08:05 +02:00
|
|
|
* GNU/Linux with Bash 4+
|
|
|
|
* [jq](https://stedolan.github.io/jq)
|
|
|
|
* [curl](https://curl.se)
|
|
|
|
* [OpenRefine](https://openrefine.org) 😉
|
2022-03-25 11:13:59 +01:00
|
|
|
|
|
|
|
## Install
|
|
|
|
|
|
|
|
1. Navigate to the OpenRefine program directory
|
|
|
|
|
|
|
|
2. Download bash script there and make it executable
|
|
|
|
|
2022-10-25 12:41:13 +02:00
|
|
|
```sh
|
|
|
|
wget https://github.com/opencultureconsulting/orcli/raw/main/orcli
|
|
|
|
chmod +x orcli
|
|
|
|
```
|
2022-03-25 11:13:59 +01:00
|
|
|
|
2022-10-25 12:41:13 +02:00
|
|
|
Optional:
|
2022-03-25 11:13:59 +01:00
|
|
|
|
2022-10-25 12:41:13 +02:00
|
|
|
* Create a symlink in your $PATH (e.g. to ~/.local/bin)
|
|
|
|
|
|
|
|
```sh
|
|
|
|
ln -s "${PWD}/orcli" ~/.local/bin/
|
|
|
|
```
|
|
|
|
|
|
|
|
* Install Bash tab completion
|
|
|
|
|
|
|
|
* temporary
|
|
|
|
|
|
|
|
```sh
|
|
|
|
source <(orcli completions)
|
|
|
|
```
|
|
|
|
|
|
|
|
* permanently
|
|
|
|
|
|
|
|
```sh
|
|
|
|
mkdir -p ~/.bashrc.d
|
|
|
|
orcli completions > ~/.bashrc.d/orcli
|
|
|
|
```
|
|
|
|
|
|
|
|
## Getting Started
|
|
|
|
|
|
|
|
1. Launch an interactive playground
|
|
|
|
|
|
|
|
```sh
|
|
|
|
./orcli run --interactive
|
|
|
|
```
|
|
|
|
|
|
|
|
2. Create OpenRefine project `duplicates` from comma-separated-values (CSV) file
|
|
|
|
|
|
|
|
```sh
|
|
|
|
orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
|
|
|
|
```
|
|
|
|
|
|
|
|
3. Show OpenRefine project's metadata
|
|
|
|
|
|
|
|
```sh
|
|
|
|
orcli info "duplicates"
|
|
|
|
```
|
|
|
|
|
2022-10-25 12:44:20 +02:00
|
|
|
4. Remove duplicates (coming soon)
|
2022-10-25 12:41:13 +02:00
|
|
|
|
|
|
|
5. Export data from OpenRefine project to tab-separated-values (TSV) file `duplicates.tsv`
|
|
|
|
|
|
|
|
```sh
|
|
|
|
orcli export tsv "duplicates" --output "duplicates.tsv"
|
|
|
|
```
|
|
|
|
|
|
|
|
6. Write out your session history to file `example.sh` (and delete the last line to remove the history command)
|
|
|
|
|
|
|
|
```sh
|
|
|
|
history -a "example.sh"
|
|
|
|
sed -i '$ d' example.sh
|
|
|
|
```
|
|
|
|
|
|
|
|
7. Exit playground
|
|
|
|
|
|
|
|
```sh
|
|
|
|
exit
|
|
|
|
```
|
|
|
|
|
|
|
|
8. Run batch process
|
|
|
|
|
|
|
|
```sh
|
|
|
|
./orcli run example.sh
|
|
|
|
```
|
|
|
|
|
|
|
|
9. Cleanup example files
|
|
|
|
|
|
|
|
```sh
|
|
|
|
rm duplicates.tsv
|
2022-10-25 12:44:20 +02:00
|
|
|
rm example.sh
|
2022-10-25 12:41:13 +02:00
|
|
|
```
|
2022-03-25 11:13:59 +01:00
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
2022-10-25 12:41:13 +02:00
|
|
|
* Use integrated help screens for available options and examples for each command.
|
2022-03-25 11:13:59 +01:00
|
|
|
|
2022-10-25 12:41:13 +02:00
|
|
|
```sh
|
|
|
|
orcli --help
|
|
|
|
```
|
|
|
|
|
|
|
|
* If your OpenRefine is running on a server, then use the environment variable OPENREFINE_URL.
|
|
|
|
|
|
|
|
```sh
|
|
|
|
OPENREFINE_URL="http://localhost:3333" orcli list
|
|
|
|
```
|
2022-03-25 11:13:59 +01:00
|
|
|
|
|
|
|
## Development
|
|
|
|
|
|
|
|
orcli uses [bashly](https://github.com/DannyBen/bashly/) for generating the one-file script from files in the `src` directory
|
|
|
|
|
|
|
|
1. Install bashly (requires ruby)
|
|
|
|
|
2022-10-25 12:44:20 +02:00
|
|
|
```sh
|
|
|
|
gem install bashly
|
|
|
|
```
|
2022-03-25 11:13:59 +01:00
|
|
|
|
|
|
|
2. Edit code in [src](src) directory
|
|
|
|
|
2022-04-20 12:27:53 +02:00
|
|
|
3. Generate script
|
2022-03-25 11:13:59 +01:00
|
|
|
|
2022-10-25 12:44:20 +02:00
|
|
|
```sh
|
|
|
|
bashly generate --upgrade
|
|
|
|
```
|