orcli/README.md

147 lines
3.2 KiB
Markdown
Raw Normal View History

2022-03-25 11:13:59 +01:00
# orcli (💎+🤖)
Bash script to control OpenRefine via [its HTTP API](https://docs.openrefine.org/technical-reference/openrefine-api).
## Features
2022-10-25 11:34:54 +02:00
* works with latest OpenRefine version (currently 3.6)
2022-10-25 12:41:13 +02:00
* run batch processes (import, transform, export)
2022-04-20 10:08:05 +02:00
* orcli takes care of starting and stopping OpenRefine with temporary workspaces
2022-10-25 12:41:13 +02:00
* allows execution of arbitrary bash scripts
* interactive mode for playing around and debugging
2022-04-20 10:08:05 +02:00
* your existing OpenRefine data will not be touched
2022-04-04 23:03:20 +02:00
* import CSV, ~~TSV, line-based TXT, fixed-width TXT, JSON or XML~~ (and specify input options)
2022-04-20 10:08:05 +02:00
* supports stdin, multiple files and URLs
* transform data by providing an [undo/redo](https://docs.openrefine.org/manual/running#history-undoredo) JSON file
* orcli calls specific endpoints for each operation to provide improved error handling and logging
* supports stdin, multiple files and URLs
2022-04-04 23:03:20 +02:00
* export to TSV, ~~CSV, HTML, XLS, XLSX, ODS~~
2022-04-04 23:00:37 +02:00
* ~~[templating export](https://docs.openrefine.org/manual/exporting#templating-exporter) to additional formats like JSON or XML~~
2022-03-25 11:13:59 +01:00
## Requirements
2022-04-20 10:08:05 +02:00
* GNU/Linux with Bash 4+
* [jq](https://stedolan.github.io/jq)
* [curl](https://curl.se)
* [OpenRefine](https://openrefine.org) 😉
2022-03-25 11:13:59 +01:00
## Install
1. Navigate to the OpenRefine program directory
2. Download bash script there and make it executable
2022-10-25 12:41:13 +02:00
```sh
wget https://github.com/opencultureconsulting/orcli/raw/main/orcli
chmod +x orcli
```
2022-03-25 11:13:59 +01:00
2022-10-25 12:41:13 +02:00
Optional:
2022-03-25 11:13:59 +01:00
2022-10-25 12:41:13 +02:00
* Create a symlink in your $PATH (e.g. to ~/.local/bin)
```sh
ln -s "${PWD}/orcli" ~/.local/bin/
```
* Install Bash tab completion
* temporary
```sh
source <(orcli completions)
```
* permanently
```sh
mkdir -p ~/.bashrc.d
orcli completions > ~/.bashrc.d/orcli
```
## Getting Started
1. Launch an interactive playground
```sh
./orcli run --interactive
```
2. Create OpenRefine project `duplicates` from comma-separated-values (CSV) file
```sh
orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
```
3. Show OpenRefine project's metadata
```sh
orcli info "duplicates"
```
2022-10-25 12:44:20 +02:00
4. Remove duplicates (coming soon)
2022-10-25 12:41:13 +02:00
5. Export data from OpenRefine project to tab-separated-values (TSV) file `duplicates.tsv`
```sh
orcli export tsv "duplicates" --output "duplicates.tsv"
```
6. Write out your session history to file `example.sh` (and delete the last line to remove the history command)
```sh
history -a "example.sh"
sed -i '$ d' example.sh
```
7. Exit playground
```sh
exit
```
8. Run batch process
```sh
./orcli run example.sh
```
9. Cleanup example files
```sh
rm duplicates.tsv
2022-10-25 12:44:20 +02:00
rm example.sh
2022-10-25 12:41:13 +02:00
```
2022-03-25 11:13:59 +01:00
## Usage
2022-10-25 12:41:13 +02:00
* Use integrated help screens for available options and examples for each command.
2022-03-25 11:13:59 +01:00
2022-10-25 12:41:13 +02:00
```sh
orcli --help
```
* If your OpenRefine is running on a server, then use the environment variable OPENREFINE_URL.
```sh
OPENREFINE_URL="http://localhost:3333" orcli list
```
2022-03-25 11:13:59 +01:00
## Development
orcli uses [bashly](https://github.com/DannyBen/bashly/) for generating the one-file script from files in the `src` directory
1. Install bashly (requires ruby)
2022-10-25 12:44:20 +02:00
```sh
gem install bashly
```
2022-03-25 11:13:59 +01:00
2. Edit code in [src](src) directory
2022-04-20 12:27:53 +02:00
3. Generate script
2022-03-25 11:13:59 +01:00
2022-10-25 12:44:20 +02:00
```sh
bashly generate --upgrade
```