orcli/README.md

154 lines
3.5 KiB
Markdown
Raw Normal View History

2022-03-25 11:13:59 +01:00
# orcli (💎+🤖)
Bash script to control OpenRefine via [its HTTP API](https://docs.openrefine.org/technical-reference/openrefine-api).
2023-10-22 22:26:58 +02:00
![Demo](demo.gif)
2022-03-25 11:13:59 +01:00
## Features
2024-12-08 23:37:39 +01:00
* works with latest OpenRefine version (currently 3.8)
2022-10-25 12:41:13 +02:00
* run batch processes (import, transform, export)
2022-04-20 10:08:05 +02:00
* orcli takes care of starting and stopping OpenRefine with temporary workspaces
2022-10-25 12:41:13 +02:00
* allows execution of arbitrary bash scripts
* interactive mode for playing around and debugging
2022-04-20 10:08:05 +02:00
* your existing OpenRefine data will not be touched
2023-10-28 18:07:08 +02:00
* import CSV, TSV, JSON, JSONL, ~~line-based TXT, fixed-width TXT or XML~~
2022-04-20 10:08:05 +02:00
* supports stdin, multiple files and URLs
* transform data by providing an [undo/redo](https://docs.openrefine.org/manual/running#history-undoredo) JSON file
* orcli calls specific endpoints for each operation to provide improved error handling and logging
* supports stdin, multiple files and URLs
2023-12-16 03:37:06 +01:00
* export to CSV, TSV, JSONL, ~~HTML, XLS, XLSX, ODS~~
2023-01-27 18:04:27 +01:00
* [templating export](https://docs.openrefine.org/manual/exporting#templating-exporter) to additional formats like JSON or XML
2022-03-25 11:13:59 +01:00
## Requirements
2022-04-20 10:08:05 +02:00
* GNU/Linux with Bash 4+
* [jq](https://stedolan.github.io/jq)
* [curl](https://curl.se)
* [OpenRefine](https://openrefine.org) 😉
2022-03-25 11:13:59 +01:00
## Install
1. Navigate to the OpenRefine program directory
2. Download bash script there and make it executable
2022-10-25 12:41:13 +02:00
```sh
wget https://github.com/opencultureconsulting/orcli/raw/main/orcli
chmod +x orcli
```
2022-03-25 11:13:59 +01:00
2022-10-25 12:41:13 +02:00
Optional:
2022-03-25 11:13:59 +01:00
2022-10-25 12:41:13 +02:00
* Create a symlink in your $PATH (e.g. to ~/.local/bin)
```sh
ln -s "${PWD}/orcli" ~/.local/bin/
```
* Install Bash tab completion
* temporary
```sh
source <(orcli completions)
```
* permanently
```sh
mkdir -p ~/.bashrc.d
orcli completions > ~/.bashrc.d/orcli
```
## Getting Started
1. Launch an interactive playground
```sh
./orcli run --interactive
```
2. Create OpenRefine project `duplicates` from comma-separated-values (CSV) file
```sh
orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
```
2022-11-03 22:07:08 +01:00
3. Remove duplicates by applying an undo/redo JSON file
```sh
2022-11-13 23:14:01 +01:00
orcli transform "duplicates" "https://git.io/fj5ju"
2022-11-03 22:07:08 +01:00
```
2022-10-25 12:41:13 +02:00
4. Export data from OpenRefine project to tab-separated-values (TSV) file `duplicates.tsv`
2022-10-25 12:41:13 +02:00
```sh
orcli export tsv "duplicates" --output "duplicates.tsv"
```
5. Write out your session history to file `example.sh` (and delete the last line to remove the history command)
2022-10-25 12:41:13 +02:00
```sh
history -a "example.sh"
sed -i '$ d' example.sh
```
6. Exit playground
2022-10-25 12:41:13 +02:00
```sh
exit
```
7. Run whole process again
2022-10-25 12:41:13 +02:00
```sh
./orcli run example.sh
```
2022-03-25 11:13:59 +01:00
## Usage
2023-10-23 00:15:42 +02:00
* Use [help screens](help/README.md) for available options and examples for each command.
2022-03-25 11:13:59 +01:00
2022-10-25 12:41:13 +02:00
```sh
orcli --help
```
2022-11-14 22:28:38 +01:00
* If your OpenRefine is running on a different port or host, then use the environment variable OPENREFINE_URL.
2022-10-25 12:41:13 +02:00
```sh
OPENREFINE_URL="http://localhost:3333" orcli list
```
2022-03-25 11:13:59 +01:00
* If OpenRefine does not have enough memory to process the data, it becomes slow and may even crash. Check the message after the run command finishes to see how much memory was used and adjust the memory allocated to OpenRefine accordingly with the `--memory` flag (default: 2048M).
2022-03-25 11:13:59 +01:00
## Development
orcli uses [bashly](https://github.com/DannyBen/bashly/) for generating the one-file script from files in the `src` directory
1. Install bashly (requires ruby)
2022-10-25 12:44:20 +02:00
```sh
gem install bashly
```
2022-03-25 11:13:59 +01:00
2. Edit code in [src](src) directory
2022-04-20 12:27:53 +02:00
3. Generate script
2022-03-25 11:13:59 +01:00
2022-10-25 12:44:20 +02:00
```sh
bashly generate --upgrade
```
2022-12-03 01:33:01 +01:00
4. Run tests
```sh
./orcli test
2023-10-23 00:09:14 +02:00
```
5. Generate help files
```sh
./help.sh
2023-10-28 16:35:50 +02:00
```