OpenRefine command-line interface written in Bash. Supports batch processing (import, transform, export).
Go to file
felixlohmeier 5cf03927b6 csv/tsv option columnNames and projectTags 2023-01-09 10:38:53 +00:00
src csv/tsv option columnNames and projectTags 2023-01-09 10:38:53 +00:00
tests add delete --force 2022-12-13 11:05:18 +00:00
.gitignore setup dev environment 2022-03-25 11:55:57 +01:00
.gitpod.yml shortened tutorial and added simple stats 2022-11-01 20:48:26 +00:00
LICENSE Initial commit 2022-03-25 10:34:28 +01:00
README.md reuse args and flags 2022-12-13 21:20:36 +00:00
orcli csv/tsv option columnNames and projectTags 2023-01-09 10:38:53 +00:00

README.md

orcli (💎+🤖)

Bash script to control OpenRefine via its HTTP API.

Features

  • works with latest OpenRefine version (currently 3.6)
  • run batch processes (import, transform, export)
    • orcli takes care of starting and stopping OpenRefine with temporary workspaces
    • allows execution of arbitrary bash scripts
    • interactive mode for playing around and debugging
    • your existing OpenRefine data will not be touched
  • import CSV, TSV, line-based TXT, fixed-width TXT, JSON or XML (and specify input options)
    • supports stdin, multiple files and URLs
  • transform data by providing an undo/redo JSON file
    • orcli calls specific endpoints for each operation to provide improved error handling and logging
    • supports stdin, multiple files and URLs
  • export to TSV, CSV, HTML, XLS, XLSX, ODS
  • templating export to additional formats like JSON or XML

Requirements

Install

  1. Navigate to the OpenRefine program directory

  2. Download bash script there and make it executable

wget https://github.com/opencultureconsulting/orcli/raw/main/orcli
chmod +x orcli

Optional:

  • Create a symlink in your $PATH (e.g. to ~/.local/bin)

    ln -s "${PWD}/orcli" ~/.local/bin/
    
  • Install Bash tab completion

    • temporary

      source <(orcli completions)
      
    • permanently

      mkdir -p ~/.bashrc.d
      orcli completions > ~/.bashrc.d/orcli
      

Getting Started

  1. Launch an interactive playground
./orcli run --interactive
  1. Create OpenRefine project duplicates from comma-separated-values (CSV) file
orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
  1. Remove duplicates by applying an undo/redo JSON file
orcli transform "duplicates" "https://git.io/fj5ju"
  1. Export data from OpenRefine project to tab-separated-values (TSV) file duplicates.tsv
orcli export tsv "duplicates" --output "duplicates.tsv"
  1. Write out your session history to file example.sh (and delete the last line to remove the history command)
history -a "example.sh"
sed -i '$ d' example.sh
  1. Exit playground
exit
  1. Run whole process again
./orcli run example.sh

Usage

  • Use integrated help screens for available options and examples for each command.

    orcli --help
    
  • If your OpenRefine is running on a different port or host, then use the environment variable OPENREFINE_URL.

    OPENREFINE_URL="http://localhost:3333" orcli list
    
  • If OpenRefine does not have enough memory to process the data, it becomes slow and may even crash. Check the message after the run command finishes to see how much memory was used and adjust the memory allocated to OpenRefine accordingly with the --memory flag (default: 2048M).

Development

orcli uses bashly for generating the one-file script from files in the src directory

  1. Install bashly (requires ruby)
gem install bashly
  1. Edit code in src directory

  2. Generate script

bashly generate --upgrade
  1. Run tests
./orcli test