Merge pull request #72 from opencultureconsulting:felixlohmeier/tutorial-42

shortened tutorial and added simple stats
This commit is contained in:
Felix Lohmeier 2022-11-01 21:50:27 +01:00 committed by GitHub
commit 600a06e7bd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 13 additions and 20 deletions

View File

@ -4,7 +4,7 @@ tasks:
before: gem install --silent bashly before: gem install --silent bashly
init: | init: |
wget -q -O openrefine.tar.gz "https://oss.sonatype.org/service/local/artifact/maven/content?r=releases&g=org.openrefine&a=openrefine&v=3.6.2&c=linux&p=tar.gz" wget -q -O openrefine.tar.gz "https://oss.sonatype.org/service/local/artifact/maven/content?r=releases&g=org.openrefine&a=openrefine&v=3.6.2&c=linux&p=tar.gz"
tar --exclude 'licenses' --exclude 'LICENSE.txt' --exclude 'README.md' -xzf openrefine.tar.gz --strip 1 tar --exclude 'licenses' --exclude 'LICENSE.txt' --exclude 'licenses.xml' --exclude 'README.md' -xzf openrefine.tar.gz --strip 1
rm openrefine.tar.gz rm openrefine.tar.gz
command: | command: |
sudo ln -s "${PWD}/orcli" /usr/local/bin/ sudo ln -s "${PWD}/orcli" /usr/local/bin/

View File

@ -73,46 +73,33 @@ Optional:
orcli import csv "https://git.io/fj5hF" --projectName "duplicates" orcli import csv "https://git.io/fj5hF" --projectName "duplicates"
``` ```
3. Show OpenRefine project's metadata 3. Remove duplicates (coming soon)
```sh 4. Export data from OpenRefine project to tab-separated-values (TSV) file `duplicates.tsv`
orcli info "duplicates"
```
4. Remove duplicates (coming soon)
5. Export data from OpenRefine project to tab-separated-values (TSV) file `duplicates.tsv`
```sh ```sh
orcli export tsv "duplicates" --output "duplicates.tsv" orcli export tsv "duplicates" --output "duplicates.tsv"
``` ```
6. Write out your session history to file `example.sh` (and delete the last line to remove the history command) 5. Write out your session history to file `example.sh` (and delete the last line to remove the history command)
```sh ```sh
history -a "example.sh" history -a "example.sh"
sed -i '$ d' example.sh sed -i '$ d' example.sh
``` ```
7. Exit playground 6. Exit playground
```sh ```sh
exit exit
``` ```
8. Run batch process 7. Run whole process again
```sh ```sh
./orcli run example.sh ./orcli run example.sh
``` ```
9. Cleanup example files
```sh
rm duplicates.tsv
rm example.sh
```
## Usage ## Usage
* Use integrated help screens for available options and examples for each command. * Use integrated help screens for available options and examples for each command.
@ -127,6 +114,8 @@ Optional:
OPENREFINE_URL="http://localhost:3333" orcli list OPENREFINE_URL="http://localhost:3333" orcli list
``` ```
* If OpenRefine does not have enough memory to process the data, it becomes slow and may even crash. Check the message after the run command finishes to see how much memory was used and adjust the memory allocated to OpenRefine accordingly with the `--memory` flag (default: 2048M).
## Development ## Development
orcli uses [bashly](https://github.com/DannyBen/bashly/) for generating the one-file script from files in the `src` directory orcli uses [bashly](https://github.com/DannyBen/bashly/) for generating the one-file script from files in the `src` directory

4
orcli
View File

@ -1,5 +1,5 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# This script was generated by bashly 0.8.9 (https://bashly.dannyb.co) # This script was generated by bashly 0.8.10 (https://bashly.dannyb.co)
# Modifying it manually is not recommended # Modifying it manually is not recommended
# :wrapper.bash3_bouncer # :wrapper.bash3_bouncer
@ -930,6 +930,8 @@ orcli_run_command() {
awk 1 "${files[$i]}" awk 1 "${files[$i]}"
) )
done done
# print stats
log "used $(($(ps --no-headers -o rss -p "$OPENREFINE_PID") / 1024)) MB RAM and $(ps --no-headers -o cputime -p "$OPENREFINE_PID") CPU time"
fi fi
} }

View File

@ -88,4 +88,6 @@ else
awk 1 "${files[$i]}" awk 1 "${files[$i]}"
) )
done done
# print stats
log "used $(($(ps --no-headers -o rss -p "$OPENREFINE_PID") / 1024)) MB RAM and $(ps --no-headers -o cputime -p "$OPENREFINE_PID") CPU time"
fi fi