OpenRefine command-line interface written in Bash. Supports batch processing (import, transform, export).
Go to file
Felix Lohmeier 8885fe89fb
Merge pull request #71 from opencultureconsulting:felixlohmeier/tutorial-42
getting started tutorial
2022-10-25 12:45:20 +02:00
src getting started tutorial 2022-10-25 10:41:13 +00:00
.gitignore setup dev environment 2022-03-25 11:55:57 +01:00
.gitpod.yml upgrade to OpenRefine 3.6.2 2022-10-25 09:34:54 +00:00
LICENSE Initial commit 2022-03-25 10:34:28 +01:00
README.md fix layout 2022-10-25 10:44:20 +00:00
orcli getting started tutorial 2022-10-25 10:41:13 +00:00

README.md

orcli (💎+🤖)

Bash script to control OpenRefine via its HTTP API.

Features

  • works with latest OpenRefine version (currently 3.6)
  • run batch processes (import, transform, export)
    • orcli takes care of starting and stopping OpenRefine with temporary workspaces
    • allows execution of arbitrary bash scripts
    • interactive mode for playing around and debugging
    • your existing OpenRefine data will not be touched
  • import CSV, TSV, line-based TXT, fixed-width TXT, JSON or XML (and specify input options)
    • supports stdin, multiple files and URLs
  • transform data by providing an undo/redo JSON file
    • orcli calls specific endpoints for each operation to provide improved error handling and logging
    • supports stdin, multiple files and URLs
  • export to TSV, CSV, HTML, XLS, XLSX, ODS
  • templating export to additional formats like JSON or XML

Requirements

Install

  1. Navigate to the OpenRefine program directory

  2. Download bash script there and make it executable

wget https://github.com/opencultureconsulting/orcli/raw/main/orcli
chmod +x orcli

Optional:

  • Create a symlink in your $PATH (e.g. to ~/.local/bin)

    ln -s "${PWD}/orcli" ~/.local/bin/
    
  • Install Bash tab completion

    • temporary

      source <(orcli completions)
      
    • permanently

      mkdir -p ~/.bashrc.d
      orcli completions > ~/.bashrc.d/orcli
      

Getting Started

  1. Launch an interactive playground
./orcli run --interactive
  1. Create OpenRefine project duplicates from comma-separated-values (CSV) file
orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
  1. Show OpenRefine project's metadata
orcli info "duplicates"
  1. Remove duplicates (coming soon)

  2. Export data from OpenRefine project to tab-separated-values (TSV) file duplicates.tsv

orcli export tsv "duplicates" --output "duplicates.tsv"
  1. Write out your session history to file example.sh (and delete the last line to remove the history command)
history -a "example.sh"
sed -i '$ d' example.sh
  1. Exit playground
exit
  1. Run batch process
./orcli run example.sh
  1. Cleanup example files
rm duplicates.tsv
rm example.sh

Usage

  • Use integrated help screens for available options and examples for each command.

    orcli --help
    
  • If your OpenRefine is running on a server, then use the environment variable OPENREFINE_URL.

    OPENREFINE_URL="http://localhost:3333" orcli list
    

Development

orcli uses bashly for generating the one-file script from files in the src directory

  1. Install bashly (requires ruby)
gem install bashly
  1. Edit code in src directory

  2. Generate script

bashly generate --upgrade