mirror of
https://github.com/opencultureconsulting/openrefine-client.git
synced 2025-04-06 00:01:03 +02:00
Compare commits
138 Commits
v0.2.1-r21
...
master
Author | SHA1 | Date | |
---|---|---|---|
![]() |
02cf1192c4 | ||
![]() |
e4d52818fc | ||
![]() |
16541c522e | ||
![]() |
965c4e97fd | ||
![]() |
0563b54fc6 | ||
![]() |
fa3e352879 | ||
![]() |
1dd0cafd4e | ||
![]() |
a368147bdf | ||
![]() |
f66c88ee35 | ||
![]() |
2735db3f3f | ||
![]() |
0df6050deb | ||
![]() |
65b0500d5e | ||
![]() |
a0274f6166 | ||
![]() |
9eeebce47c | ||
![]() |
21bf1f2adc | ||
![]() |
8a726d9a30 | ||
![]() |
bc89a98776 | ||
![]() |
82da3f7b4e | ||
![]() |
356315bc9e | ||
![]() |
41b90c38b3 | ||
![]() |
1c812d1253 | ||
![]() |
f0d76b6acd | ||
![]() |
ecf253ca44 | ||
![]() |
c47ce10eba | ||
![]() |
c80bef3fb1 | ||
![]() |
7b657eb76e | ||
![]() |
d22022e273 | ||
![]() |
6778c5cf73 | ||
![]() |
97bada2254 | ||
![]() |
bd13ffeb50 | ||
![]() |
240d0368f5 | ||
![]() |
33430d5fe2 | ||
![]() |
3c16169767 | ||
![]() |
4ed6925b25 | ||
![]() |
505a62afc2 | ||
![]() |
18a4d68b5c | ||
![]() |
41c1e618bb | ||
![]() |
cb797fac51 | ||
![]() |
f4f38d02fb | ||
![]() |
2e6507bdf2 | ||
![]() |
75e9a763d1 | ||
![]() |
375ac42be0 | ||
![]() |
7ad79af3ca | ||
![]() |
abbef338ff | ||
![]() |
5730150b8c | ||
![]() |
caa2ebfde8 | ||
![]() |
062e6960a8 | ||
![]() |
d9efd5c61b | ||
![]() |
bfd00b55aa | ||
![]() |
7d66993982 | ||
![]() |
be439c986b | ||
![]() |
777d73997c | ||
![]() |
e18b4d04be | ||
![]() |
aa5b3a4203 | ||
![]() |
ad95432fc0 | ||
![]() |
6de4399012 | ||
![]() |
70782d8465 | ||
![]() |
f0e6fbcd75 | ||
![]() |
5819b15cf3 | ||
![]() |
b6f20f2e93 | ||
![]() |
4a455282f5 | ||
![]() |
ece61c1096 | ||
![]() |
b8675fc894 | ||
![]() |
1d0c4f828a | ||
![]() |
d82c7b28fb | ||
![]() |
16560cb884 | ||
![]() |
0c26aad39a | ||
![]() |
a0123e9511 | ||
![]() |
04db513453 | ||
![]() |
e03b3633e5 | ||
![]() |
aa844bde99 | ||
![]() |
221b83e805 | ||
![]() |
fce77d8d78 | ||
![]() |
491795a30d | ||
![]() |
0c0a0dfc1c | ||
![]() |
fd8f34be39 | ||
![]() |
c896248c8c | ||
![]() |
f7b33684b3 | ||
![]() |
e31d565194 | ||
![]() |
3e8f209a50 | ||
![]() |
913e5eda56 | ||
![]() |
53fc89c147 | ||
![]() |
9e97aff653 | ||
![]() |
8253b8f794 | ||
![]() |
1605ac2cab | ||
![]() |
ed9e1e2afb | ||
![]() |
6c65f15363 | ||
![]() |
6262d703d3 | ||
![]() |
28b4c7466b | ||
![]() |
2061e804c3 | ||
![]() |
058552aab6 | ||
![]() |
947c7510a6 | ||
![]() |
31f06b35c4 | ||
![]() |
7a0f405007 | ||
![]() |
b6ce0cf24c | ||
![]() |
f0643b46a0 | ||
![]() |
f70fed2966 | ||
![]() |
37004aadff | ||
![]() |
4e03a1452f | ||
![]() |
56e5ee96f5 | ||
![]() |
8f2ef1d3e0 | ||
![]() |
1d0fb82f07 | ||
![]() |
44fd7c4611 | ||
![]() |
bf91e918df | ||
![]() |
6f8badae6a | ||
![]() |
74ec004c8b | ||
![]() |
b45cda5ad1 | ||
![]() |
ad6072b1bb | ||
![]() |
115937d447 | ||
![]() |
bd73be52ea | ||
![]() |
67b586a734 | ||
![]() |
b83245cba9 | ||
![]() |
8716b15d4c | ||
![]() |
78a7a75515 | ||
![]() |
92b2316077 | ||
![]() |
221d7da379 | ||
![]() |
1b99ff1d62 | ||
![]() |
26fe214eaf | ||
![]() |
35963dad38 | ||
![]() |
684dcdf8df | ||
![]() |
101a226a4f | ||
![]() |
2d94ac4e36 | ||
![]() |
9bd8102b0a | ||
![]() |
1a4f00b3cd | ||
![]() |
ca25a305e0 | ||
![]() |
717a03a838 | ||
![]() |
a1ea660ffa | ||
![]() |
b92aa0efd1 | ||
![]() |
1a5f7c482d | ||
![]() |
08dd425f28 | ||
![]() |
4104e58fd5 | ||
![]() |
c9fbc66e8e | ||
![]() |
bbd4d84c96 | ||
![]() |
eade7dafe0 | ||
![]() |
5b701821a9 | ||
![]() |
6f8db835ab | ||
![]() |
bc0a8e7c7b | ||
![]() |
e9ef9a6d56 |
7
.gitignore
vendored
7
.gitignore
vendored
@ -2,5 +2,8 @@ build
|
||||
dist
|
||||
*.pyc
|
||||
.*
|
||||
refine_client.egg-info
|
||||
README.html
|
||||
openrefine_client.egg-info
|
||||
refine.spec
|
||||
openrefine-*
|
||||
openrefine-client_*
|
||||
tests-cli.log
|
||||
|
@ -1,4 +1,4 @@
|
||||
include README.rst
|
||||
include README.md
|
||||
include COPYING.txt
|
||||
recursive-include tests/data *.csv
|
||||
recursive-include tests *.py
|
||||
|
2
Makefile
2
Makefile
@ -25,7 +25,7 @@ install:
|
||||
clean:
|
||||
find . -name '*.pyc' | xargs rm -f
|
||||
# XXX is there some way of having setup.py clean up its junk?
|
||||
rm -rf README.html build dist refine_client.egg-info distribute-*
|
||||
rm -rf README.{html,txt} build dist refine_client.egg-info distribute-*
|
||||
|
||||
upload: clean
|
||||
python setup.py sdist upload
|
||||
|
773
README.md
Normal file
773
README.md
Normal file
@ -0,0 +1,773 @@
|
||||
# OpenRefine Python Client with extended command line interface (⌨️ for 💎)
|
||||
|
||||
[](https://www.codacy.com/gh/opencultureconsulting/openrefine-client/dashboard) [](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [](https://pypi.org/project/openrefine-client/) [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master)
|
||||
|
||||
The [OpenRefine Python Client from PaulMakepeace](https://github.com/PaulMakepeace/refine-client-py) provides a library for communicating with an [OpenRefine](http://openrefine.org) server.
|
||||
This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, macOS).
|
||||
It is also available via Docker Hub, PyPI and Binder.
|
||||
|
||||
works with OpenRefine 2.7, 2.8, 3.0, 3.1, 3.2, 3.3, 3.4, 3.4.1, 3.5.0
|
||||
|
||||
## Download
|
||||
|
||||
One-file-executables:
|
||||
|
||||
- Windows: [openrefine-client_0-3-10_windows.exe](https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.10/openrefine-client_0-3-10_windows.exe) (~5 MB)
|
||||
- macOS: [openrefine-client_0-3-10_macos](https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.10/openrefine-client_0-3-10_macos) (~5 MB)
|
||||
- Linux: [openrefine-client_0-3-10_linux](https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.10/openrefine-client_0-3-10_linux) (~5 MB)
|
||||
|
||||
For [Docker](#docker) containers, native [Python](#python) installation and free [Binder](#binder) on-demand server see the corresponding chapters below.
|
||||
|
||||
## Peek
|
||||
|
||||
A short video loop that demonstrates the basic features (list, create, apply, export):
|
||||
|
||||

|
||||
|
||||
## Usage
|
||||
|
||||
Ensure you have [OpenRefine](http://openrefine.org) running (i.e. available at http://localhost:3333 or [another URL](#change-url)).
|
||||
|
||||
To use the client:
|
||||
|
||||
1. Open a terminal pointing to the folder where you have [downloaded](#download) the one-file-executable (e.g. Downloads in your home directory).
|
||||
|
||||
- Windows: Open PowerShell and enter following command
|
||||
|
||||
```sh
|
||||
cd ~\Downloads
|
||||
```
|
||||
|
||||
- macOS: Open Terminal (Finder > Applications > Utilities > Terminal) and enter following command
|
||||
|
||||
```sh
|
||||
cd ~/Downloads
|
||||
```
|
||||
|
||||
- Linux: Open terminal app (Terminal, Konsole, xterm, ...) and enter following command
|
||||
|
||||
```sh
|
||||
cd ~/Downloads
|
||||
```
|
||||
|
||||
2. Make the file executable.
|
||||
|
||||
- Windows: not necessary
|
||||
|
||||
- macOS:
|
||||
|
||||
```sh
|
||||
chmod +x openrefine-client_0-3-10_macos
|
||||
```
|
||||
|
||||
- Linux:
|
||||
|
||||
```sh
|
||||
chmod +x openrefine-client_0-3-10_linux
|
||||
```
|
||||
|
||||
3. Execute the file.
|
||||
|
||||
- Windows:
|
||||
|
||||
```sh
|
||||
.\openrefine-client_0-3-10_windows.exe
|
||||
```
|
||||
|
||||
- macOS:
|
||||
|
||||
```sh
|
||||
./openrefine-client_0-3-10_macos
|
||||
```
|
||||
|
||||
- Linux:
|
||||
|
||||
```sh
|
||||
./openrefine-client_0-3-10_linux
|
||||
```
|
||||
|
||||
Using tab completion and command history is highly recommended:
|
||||
|
||||
- autocomplete filenames: enter a few characters and press `↹`
|
||||
- recall previous command: press `↑`
|
||||
|
||||
### Basic commands
|
||||
|
||||
Execute the client by entering its filename followed by the desired command.
|
||||
|
||||
The following example will download two small files ([duplicates.csv](https://raw.githubusercontent.com/opencultureconsulting/openrefine-client/master/tests/data/duplicates.csv) and [duplicates-deletion.json](https://raw.githubusercontent.com/opencultureconsulting/openrefine-client/master/tests/data/duplicates-deletion.json)) into the current directory and will create a new OpenRefine project from file duplicates.csv.
|
||||
|
||||
Download example data (`--download`) and create project from file (`--create`):
|
||||
|
||||
- Windows:
|
||||
|
||||
```sh
|
||||
.\openrefine-client_0-3-10_windows.exe --download "https://git.io/fj5hF" --output=duplicates.csv
|
||||
.\openrefine-client_0-3-10_windows.exe --download "https://git.io/fj5ju" --output=duplicates-deletion.json
|
||||
.\openrefine-client_0-3-10_windows.exe --create duplicates.csv
|
||||
```
|
||||
|
||||
- macOS:
|
||||
|
||||
```sh
|
||||
./openrefine-client_0-3-10_macos --download "https://git.io/fj5hF" --output=duplicates.csv
|
||||
./openrefine-client_0-3-10_macos --download "https://git.io/fj5ju" --output=duplicates-deletion.json
|
||||
./openrefine-client_0-3-10_macos --create duplicates.csv
|
||||
```
|
||||
|
||||
- Linux:
|
||||
|
||||
```sh
|
||||
./openrefine-client_0-3-10_linux --download "https://git.io/fj5hF" --output=duplicates.csv
|
||||
./openrefine-client_0-3-10_linux --download "https://git.io/fj5ju" --output=duplicates-deletion.json
|
||||
./openrefine-client_0-3-10_linux --create duplicates.csv
|
||||
```
|
||||
|
||||
Other commands:
|
||||
|
||||
- list all projects: `--list`
|
||||
- show project metadata: `--info "duplicates"`
|
||||
- export project to terminal: `--export "duplicates"`
|
||||
- apply [rules from json file](http://kb.refinepro.com/2012/06/google-refine-json-and-my-notepad-or.html): `--apply duplicates-deletion.json "duplicates"`
|
||||
- export project to file: `--export --output=deduped.xls "duplicates"`
|
||||
- delete project: `--delete "duplicates"`
|
||||
|
||||
### Getting help
|
||||
|
||||
Check `--help` for further options.
|
||||
|
||||
Please file an [issue](https://github.com/opencultureconsulting/openrefine-client/issues) if you miss some features in the command line interface or if you have tracked a bug.
|
||||
And you are welcome to ask any questions!
|
||||
|
||||
### Change URL
|
||||
|
||||
By default the client connects to the usual URL of OpenRefine [http://localhost:3333](http://localhost:3333).
|
||||
If your OpenRefine server is running somewhere else then you may set hostname and port with additional command line options (e.g. http://example.com):
|
||||
|
||||
- set host: `-H example.com`
|
||||
- set port: `-P 80`
|
||||
|
||||
### Templating
|
||||
|
||||
The OpenRefine [Templating](https://github.com/OpenRefine/OpenRefine/wiki/Export-As-YAML) supports exporting data in any text format (i.e. to construct JSON or XML).
|
||||
The graphical user interface offers four input fields:
|
||||
|
||||
1. prefix
|
||||
2. row template
|
||||
- supports [GREL](https://github.com/OpenRefine/OpenRefine/wiki/General-Refine-Expression-Language) inside two curly brackets, e.g. `{{jsonize(cells["name"].value)}}`
|
||||
3. row separator
|
||||
4. suffix
|
||||
|
||||
This templating functionality is available via the openrefine-client command line interface.
|
||||
It even provides an additional feature for splitting results into multiple files.
|
||||
|
||||
To try out the functionality create another project from the example file above.
|
||||
|
||||
```sh
|
||||
--create duplicates.csv --projectName=advanced
|
||||
```
|
||||
|
||||
The following example code will export...
|
||||
|
||||
- the columns "name" and "purchase" in JSON format
|
||||
- from the project "advanced"
|
||||
- for rows matching the regex text filter `^F$` in column "gender"
|
||||
|
||||
macOS/Linux Terminal (multi-line input with `\` ):
|
||||
|
||||
```sh
|
||||
"advanced" \
|
||||
--prefix='{ "events" : [
|
||||
' \
|
||||
--template=' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \
|
||||
--rowSeparator=',
|
||||
' \
|
||||
--suffix='
|
||||
] }' \
|
||||
--filterQuery='^F$' \
|
||||
--filterColumn='gender'
|
||||
```
|
||||
|
||||
Windows PowerShell (multi-line input with `` ` ``; quotes needs to be doubled):
|
||||
|
||||
```sh
|
||||
"advanced" `
|
||||
--prefix='{ ""events"" : [
|
||||
' `
|
||||
--template=' { ""name"" : {{jsonize(cells[""name""].value)}}, ""purchase"" : {{jsonize(cells[""purchase""].value)}} }' `
|
||||
--rowSeparator=',
|
||||
' `
|
||||
--suffix='
|
||||
] }' `
|
||||
--filterQuery='^F$' `
|
||||
--filterColumn='gender'
|
||||
```
|
||||
|
||||
Add the following options to the last command (recall with `↑`) to store the results in multiple files.
|
||||
Each file will contain the prefix, an processed row, and the suffix.
|
||||
|
||||
```sh
|
||||
--output=advanced.json --splitToFiles=true
|
||||
```
|
||||
|
||||
Filenames are suffixed with the row number by default (e.g. `advanced_1.json`, `advanced_2.json` etc.).
|
||||
There is another option to use the value in the first column instead:
|
||||
|
||||
```sh
|
||||
--output=advanced.json --splitToFiles=true --suffixById=true
|
||||
```
|
||||
|
||||
Because our project "advanced" contains duplicates in the first column "email" this command will overwrite files (e.g. `advanced_melanie.white@example2.edu.json`).
|
||||
When using this option, the first column should contain unique identifiers.
|
||||
|
||||
### Append data to an existing project
|
||||
|
||||
OpenRefine does not support appending rows to an existing project.
|
||||
As long as the [feature request](https://github.com/OpenRefine/OpenRefine/issues/715) is not yet implemented, you can use the openrefine-client to script a workaround:
|
||||
|
||||
1. export existing project as csv
|
||||
2. put old and new data into a zip archive
|
||||
3. create new project by importing the zip archive
|
||||
|
||||
Here is an example that replaces the existing project:
|
||||
|
||||
```
|
||||
openrefine-client --export myproject --output old.csv
|
||||
openrefine-client --delete myproject
|
||||
zip combined.zip old.csv new.csv
|
||||
openrefine-client --create combined.zip --format csv --projectName myproject
|
||||
```
|
||||
|
||||
Note that the project id will change.
|
||||
If you want to distinguish between old and new data, you can use the additional flag includeFileSources:
|
||||
|
||||
```
|
||||
openrefine-client --create combined.zip --format csv --projectName myproject --includeFileSources true
|
||||
```
|
||||
|
||||
### See also
|
||||
|
||||
- Linux Bash script to run OpenRefine in batch mode (import, transform, export): [openrefine-batch](https://github.com/opencultureconsulting/openrefine-batch)
|
||||
- [Jupyter notebook demonstrating usage in Linux Bash](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/notebooks/openrefine-client-bash.ipynb)
|
||||
- Use case [HOS-MetadataTransformations](https://github.com/subhh/HOS-MetadataTransformations): Automated workflow for harvesting, transforming and indexing of metadata using metha, OpenRefine and Solr. Part of the Hamburg Open Science "Schaufenster" software stack.
|
||||
- Use case [Data processing of ILS data to facilitate a new discovery layer for the German Literature Archive (DLA)](https://doi.org/10.5281/zenodo.2678113): Custom data processing pipeline based on Pandas (a Python library) and OpenRefine.
|
||||
|
||||
## Docker
|
||||
|
||||
[felixlohmeier/openrefine-client](https://hub.docker.com/r/felixlohmeier/openrefine-client/) [](https://hub.docker.com/r/felixlohmeier/openrefine-client/)
|
||||
|
||||
```sh
|
||||
docker pull felixlohmeier/openrefine-client:v0.3.10
|
||||
```
|
||||
|
||||
### Option 1: Dockerized client
|
||||
|
||||
Run client and mount current directory as workspace:
|
||||
|
||||
```sh
|
||||
docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10
|
||||
```
|
||||
|
||||
The docker option `--network=host` allows you to connect to a local or remote OpenRefine via the host network:
|
||||
|
||||
- list projects on default URL (http://localhost:3333)
|
||||
|
||||
```sh
|
||||
docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 --list
|
||||
```
|
||||
|
||||
- list projects on a remote server (http://example.com)
|
||||
|
||||
```sh
|
||||
docker run --rm --network=host -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H example.com -P 80 --list
|
||||
```
|
||||
|
||||
Usage: same commands as explained above (see [Basic Commands](#basic-commands) and [Advanced Templating](#advanced-templating))
|
||||
|
||||
### Option 2: Dockerized client and dockerized OpenRefine
|
||||
|
||||
Run openrefine-client linked to a dockerized OpenRefine ([felixlohmeier/openrefine](https://hub.docker.com/r/felixlohmeier/openrefine/) [](https://hub.docker.com/r/felixlohmeier/openrefine)):
|
||||
|
||||
1. Create docker network
|
||||
|
||||
```sh
|
||||
docker network create openrefine
|
||||
```
|
||||
|
||||
2. Run server (will be available at http://localhost:3333)
|
||||
|
||||
```sh
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.5.0
|
||||
```
|
||||
|
||||
3. Run client with some [basic commands](#basic-commands): 1. download example files, 2. create project from file, 3. list projects, 4. show metadata, 5. export to terminal, 6. apply transformation rules (deduplication), 7. export again to terminal, 8. export to xls file and 9. delete project
|
||||
|
||||
```sh
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 --download "https://git.io/fj5hF" --output=duplicates.csv
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 --download "https://git.io/fj5ju" --output=duplicates-deletion.json
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --create duplicates.csv
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --list
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --info "duplicates"
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --export "duplicates"
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --apply duplicates-deletion.json "duplicates"
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --export "duplicates"
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --export --output=deduped.xls "duplicates"
|
||||
docker run --rm --network=openrefine -v ${PWD}:/data:z felixlohmeier/openrefine-client:v0.3.10 -H openrefine-server --delete "duplicates"
|
||||
```
|
||||
|
||||
4. Stop and delete server:
|
||||
|
||||
```sh
|
||||
docker stop openrefine-server
|
||||
docker rm openrefine-server
|
||||
```
|
||||
|
||||
5. Delete docker network:
|
||||
|
||||
```sh
|
||||
docker network rm openrefine
|
||||
```
|
||||
|
||||
Customize OpenRefine server:
|
||||
|
||||
- If you want to add an OpenRefine startup option you need to repeat the default commands (cf. [Dockerfile](https://hub.docker.com/r/felixlohmeier/openrefine/dockerfile))
|
||||
- `-i 0.0.0.0` sets OpenRefine to be accessible from outside the container, i.e. from host
|
||||
- `-d /data` sets OpenRefine workspace
|
||||
|
||||
- Example for [allocating more memory](https://github.com/OpenRefine/OpenRefine/wiki/FAQ#out-of-memory-errors---feels-slow---could-not-reserve-enough-space-for-object-heap) to OpenRefine with additional option `-m 4G`
|
||||
|
||||
```sh
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:3.5.0 -i 0.0.0.0 -d /data -m 4G
|
||||
```
|
||||
|
||||
- The OpenRefine version is defined by the docker tag.
|
||||
Check the [DockerHub repository](https://hub.docker.com/r/felixlohmeier/openrefine) for available tags.
|
||||
Example for OpenRefine `2.8` with same options as above:
|
||||
|
||||
```sh
|
||||
docker run -d -p 3333:3333 --network=openrefine --name=openrefine-server felixlohmeier/openrefine:2.8 -i 0.0.0.0 -d /data -m 4G
|
||||
```
|
||||
|
||||
- If you want OpenRefine to read and write persistent data in host directory (i.e. store projects) you can mount the container path `/data`. Example for host directory `/home/felix/refine`:
|
||||
|
||||
```sh
|
||||
docker run -d -p 3333:3333 -v /home/felix/refine:/data:z --network=openrefine name=openrefine-server felixlohmeier/openrefine:2.8 -i 0.0.0.0 -d /data -m 4G
|
||||
```
|
||||
|
||||
See also:
|
||||
|
||||
- [GitHub Repository](https://github.com/opencultureconsulting/openrefine-docker) for docker container `felixlohmeier/openrefine`
|
||||
- Linux Bash script to run OpenRefine in batch mode (import, transform, export) with docker containers: [openrefine-batch-docker.sh](https://github.com/opencultureconsulting/openrefine-batch/#docker)
|
||||
|
||||
## Python
|
||||
|
||||
[openrefine-client](https://pypi.org/project/openrefine-client/) [](https://pypi.org/project/openrefine-client/) (requires Python 2.x)
|
||||
|
||||
```sh
|
||||
python2 -m pip install openrefine-client --user
|
||||
```
|
||||
|
||||
This will install the package `openrefine-client` containing modules in `google.refine`.
|
||||
|
||||
A command line script `openrefine-client` will also be installed.
|
||||
|
||||
### Option 1: command line script
|
||||
|
||||
```sh
|
||||
openrefine-client --help
|
||||
```
|
||||
|
||||
Usage: same commands as explained above (see [Basic Commands](#basic-commands) and [Advanced Templating](#advanced-templating))
|
||||
|
||||
### Option 2: using cli functions in Python 2.x environment
|
||||
|
||||
Import module cli:
|
||||
|
||||
```python
|
||||
from google.refine import cli
|
||||
```
|
||||
|
||||
Change URL (if necessary):
|
||||
|
||||
```python
|
||||
cli.refine.REFINE_HOST = 'localhost'
|
||||
cli.refine.REFINE_PORT = '3333'
|
||||
```
|
||||
|
||||
Help screen:
|
||||
|
||||
```python
|
||||
help(cli)
|
||||
```
|
||||
|
||||
Commands:
|
||||
|
||||
* download (e.g. example data):
|
||||
|
||||
```python
|
||||
cli.download('https://git.io/fj5hF','duplicates.csv')
|
||||
cli.download('https://git.io/fj5ju','duplicates-deletion.json')
|
||||
```
|
||||
|
||||
* list projects:
|
||||
|
||||
```python
|
||||
cli.ls()
|
||||
```
|
||||
|
||||
* create project:
|
||||
|
||||
```python
|
||||
p1 = cli.create('duplicates.csv')
|
||||
```
|
||||
|
||||
* show metadata:
|
||||
|
||||
```python
|
||||
cli.info(p1.project_id)
|
||||
```
|
||||
|
||||
* apply rules from file to project:
|
||||
|
||||
```python
|
||||
cli.apply(p1.project_id, 'duplicates-deletion.json')
|
||||
```
|
||||
|
||||
* export project to terminal:
|
||||
|
||||
```python
|
||||
cli.export(p1.project_id)
|
||||
```
|
||||
|
||||
* export project to file in xls format:
|
||||
|
||||
```python
|
||||
cli.export(p1.project_id, 'deduped.xls')
|
||||
```
|
||||
|
||||
* export templating (see [Advanced Templating](#advanced-templating) above):
|
||||
|
||||
```python
|
||||
cli.templating(
|
||||
p1.project_id,
|
||||
prefix='''{ "events" : [
|
||||
''',template=''' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }''',
|
||||
rowSeparator=''',
|
||||
''',suffix='''
|
||||
] }''')
|
||||
```
|
||||
|
||||
* delete project:
|
||||
|
||||
```python
|
||||
cli.delete(p1.project_id)
|
||||
```
|
||||
|
||||
### Option 3: the upstream way
|
||||
|
||||
This fork can be used in the same way as the upstream [Python client library](https://github.com/PaulMakepeace/refine-client-py/).
|
||||
|
||||
Some functions in the python client library are not yet compatible with OpenRefine >=3.0 (cf. [issue #19 in refine-client-py](https://github.com/paulmakepeace/refine-client-py/issues/19)).
|
||||
|
||||
Import module refine:
|
||||
|
||||
```python
|
||||
from google.refine import refine
|
||||
```
|
||||
|
||||
Server Commands:
|
||||
|
||||
* set up connection:
|
||||
|
||||
```python
|
||||
server1 = refine.Refine('http://localhost:3333')
|
||||
```
|
||||
|
||||
- show version:
|
||||
|
||||
```python
|
||||
server1.server.get_version()
|
||||
server1.server.version
|
||||
```
|
||||
|
||||
- list projects:
|
||||
|
||||
```python
|
||||
server1.list_projects()
|
||||
```
|
||||
|
||||
- pretty print the returned dict with json.dumps:
|
||||
|
||||
```python
|
||||
import json
|
||||
print(json.dumps(server1.list_projects(), indent=1))
|
||||
```
|
||||
|
||||
- create project:
|
||||
|
||||
```python
|
||||
server1.new_project(project_file='duplicates.csv')
|
||||
```
|
||||
|
||||
* create and open the returned project in one step:
|
||||
|
||||
```python
|
||||
project1 = server1.new_project(project_file='duplicates.csv')
|
||||
```
|
||||
|
||||
Project commands:
|
||||
|
||||
* open project:
|
||||
|
||||
```python
|
||||
project1 = server1.open_project('1234567890123')
|
||||
```
|
||||
|
||||
* print full URL to project:
|
||||
|
||||
```python
|
||||
project1.project_url()
|
||||
```
|
||||
|
||||
* list columns:
|
||||
|
||||
```python
|
||||
project1.columns
|
||||
```
|
||||
|
||||
* compute text facet on first column (**fails with OpenRefine >=3.2**):
|
||||
|
||||
```python
|
||||
project1.compute_facets(facet.TextFacet(project1.columns[0]))
|
||||
```
|
||||
|
||||
* print returned object
|
||||
|
||||
```python
|
||||
facets = project1.compute_facets(facet.TextFacet(project1.columns[0])).facets[0]
|
||||
for k in sorted(facets.choices, key=lambda k: facets.choices[k].count, reverse=True):
|
||||
print(facets.choices[k].count, k)
|
||||
```
|
||||
|
||||
* compute clusters on first column:
|
||||
|
||||
```python
|
||||
project1.compute_clusters(project1.columns[0])
|
||||
```
|
||||
|
||||
* apply rules from file to project:
|
||||
|
||||
```python
|
||||
project1.apply_operations('duplicates-deletion.json')
|
||||
```
|
||||
|
||||
* export project:
|
||||
|
||||
```python
|
||||
project1.export(export_format='tsv')
|
||||
```
|
||||
|
||||
* print the returned fileobject:
|
||||
|
||||
```python
|
||||
print(project1.export(export_format='tsv').read())
|
||||
```
|
||||
|
||||
* save the returned fileobject to file:
|
||||
|
||||
```python
|
||||
with open('export.tsv', 'wb') as f:
|
||||
f.write(project1.export(export_format='tsv').read())
|
||||
```
|
||||
|
||||
* templating export (**function was added in this fork**, see [Advanced Templating](#advanced-templating) above):
|
||||
|
||||
```python
|
||||
data = project1.export_templating(
|
||||
prefix='''{ "events" : [
|
||||
''',template=''' { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }''',
|
||||
rowSeparator=''',
|
||||
''',suffix='''
|
||||
] }''')
|
||||
print(data.read())
|
||||
```
|
||||
|
||||
* print help screen with available commands (many more!):
|
||||
|
||||
```python
|
||||
help(project1)
|
||||
```
|
||||
|
||||
* example for custom commands:
|
||||
|
||||
```python
|
||||
project1.do_json('get-rows')['total']
|
||||
```
|
||||
|
||||
* delete project:
|
||||
|
||||
```python
|
||||
project1.delete()
|
||||
```
|
||||
|
||||
See also:
|
||||
|
||||
- Jupyter notebook by Trevor Muñoz (2013-08-18): [Programmatic Use of Open Refine to Facet and Cluster Names of 'Dishes' from NYPL's What's on the menu?](https://nbviewer.jupyter.org/gist/trevormunoz/6265360)
|
||||
- Jupyter notebook by Tony Hirst (2019-01-09) [Notebook demonstrating how to control OpenRefine via a Python client.](https://nbviewer.jupyter.org/github/ouseful-PR/openrefineder/blob/4cef25a4ca6077536c5f49cafb531499fbcad96e/notebooks/OpenRefine%20Demos.ipynb)
|
||||
- Unittests [test_refine.py](tests/test_refine.py) and [test_tutorial.py](tests/test_tutorial.py) (both importing [refinetest.py](tests/refinetest.py))
|
||||
- [OpenRefine API](https://github.com/OpenRefine/OpenRefine/wiki/OpenRefine-API) in official OpenRefine wiki
|
||||
|
||||
## Binder
|
||||
|
||||
[](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master)
|
||||
|
||||
- free to use on-demand server with Jupyter notebook, OpenRefine and Bash
|
||||
- no registration needed, will start within a few minutes
|
||||
- [restricted](https://mybinder.readthedocs.io/en/latest/faq.html#how-much-memory-am-i-given-when-using-binder) to 2 GB RAM and server will be deleted after 10 minutes of inactivity
|
||||
- [bash_kernel demo notebook](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/notebooks/openrefine-client-bash.ipynb) for using the openrefine-client in a Linux Bash environment [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master?urlpath=/tree/notebooks/openrefine-client-bash.ipynb)
|
||||
- [python2 demo notebook](https://nbviewer.jupyter.org/github/felixlohmeier/openrefineder/blob/master/notebooks/openrefine-client-python.ipynb) for using the openrefine-client in a Python 2 environment [](https://mybinder.org/v2/gh/felixlohmeier/openrefineder/master?urlpath=/tree/notebooks/openrefine-client-python.ipynb)
|
||||
|
||||
## Development
|
||||
|
||||
If you would like to contribute to the Python client library please consider a pull request to the upstream repository [refine-client-py](https://github.com/PaulMakepeace/refine-client-py/).
|
||||
|
||||
### Tests
|
||||
|
||||
Ensure you have OpenRefine running (i.e. available at http://localhost:3333). If necessary set the environment variables `OPENREFINE_HOST` and `OPENREFINE_PORT` to change the URL.
|
||||
|
||||
The Python client library includes several unit tests.
|
||||
|
||||
- run all tests
|
||||
|
||||
```sh
|
||||
python setup.py test
|
||||
```
|
||||
|
||||
- run subset test_facet
|
||||
|
||||
```sh
|
||||
python setup.py --test-suite tests.test_facet
|
||||
```
|
||||
|
||||
There is also a script that uses docker images to run the unit tests with different versions of OpenRefine.
|
||||
|
||||
- run tests on all OpenRefine versions (from 2.0 up to 3.5.0)
|
||||
|
||||
```sh
|
||||
./tests.sh -a
|
||||
```
|
||||
|
||||
- run tests on tag 3.5.0
|
||||
|
||||
```sh
|
||||
./tests.sh -t 3.5.0
|
||||
```
|
||||
|
||||
- run tests on tag 3.5.0 interactively (pause before and after tests)
|
||||
|
||||
```sh
|
||||
./tests.sh -t 3.5.0 -i
|
||||
```
|
||||
|
||||
- run tests on tags 3.5.0 and 2.7
|
||||
|
||||
```sh
|
||||
./tests.sh -t 3.5.0 -t 2.7
|
||||
```
|
||||
|
||||
For Linux there are also functional tests for all command line options.
|
||||
|
||||
- run all functional tests on OpenRefine 3.5.0
|
||||
|
||||
```sh
|
||||
./tests-cli.sh 3.5.0
|
||||
```
|
||||
|
||||
- run all functional tests on OpenRefine 3.5.0 with one-file-executable
|
||||
|
||||
```sh
|
||||
./tests-cli.sh 3.5.0 openrefine-client_0-3-7_linux
|
||||
```
|
||||
|
||||
### Distributing
|
||||
|
||||
Note to myself: When releasing a new version...
|
||||
|
||||
1. Run functional tests
|
||||
|
||||
```sh
|
||||
for v in 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.4.1 3.5.0; do
|
||||
./tests-cli.sh $v
|
||||
done
|
||||
```
|
||||
|
||||
2. Make final changes in Git
|
||||
|
||||
- update versions (e.g. 0.3.7 und 0-3-7) in [README.md](https://github.com/opencultureconsulting/openrefine-client/blob/master/README.md#download)
|
||||
- update version in [setup.py](https://github.com/opencultureconsulting/openrefine-client/blob/master/setup.py)
|
||||
- check if [Dockerfile](https://github.com/opencultureconsulting/openrefine-client/blob/master/docker/Dockerfile) needs to be changed
|
||||
|
||||
3. Build executables with PyInstaller
|
||||
|
||||
- Run PyInstaller in Python 2 environments on native Windows, macOS and Linux. Should be "the oldest version of the OS you need to support"! Current release is built with:
|
||||
|
||||
- Ubuntu 16.04 LTS (64-bit)
|
||||
- macOS Sierra 10.12 (64-bit)
|
||||
- Windows 7 (32-bit)
|
||||
|
||||
- One-file-executables will be available in `dist/`.
|
||||
|
||||
```sh
|
||||
git clone https://github.com/opencultureconsulting/openrefine-client.git
|
||||
cd openrefine-client
|
||||
python2 -m pip install pyinstaller --user
|
||||
python2 -m pip install urllib2_file --user
|
||||
python2 -m PyInstaller --onefile refine.py --hidden-import google.refine.__main__
|
||||
```
|
||||
|
||||
4. Run functional tests with Linux executable
|
||||
|
||||
```sh
|
||||
for v in 2.7 2.8 3.0 3.1 3.2 3.3 3.4 3.4.1 3.5.0; do
|
||||
./tests-cli.sh $v openrefine-client_0-3-7_linux
|
||||
done
|
||||
```
|
||||
|
||||
5. Create release in GitHub
|
||||
|
||||
- draft [release notes](https://github.com/opencultureconsulting/openrefine-client/releases) and attach one-file-executables
|
||||
|
||||
6. Build package and upload to PyPI
|
||||
|
||||
```sh
|
||||
python3 setup.py sdist bdist_wheel
|
||||
python3 -m twine upload dist/*
|
||||
```
|
||||
|
||||
7. Update Docker container
|
||||
|
||||
- add new autobuild for release version
|
||||
- trigger latest build
|
||||
|
||||
8. Bump openrefine-client version in related projects
|
||||
|
||||
- openrefine-batch: [openrefine-batch.sh](https://github.com/opencultureconsulting/openrefine-batch/blob/master/openrefine-batch.sh#L7) and [openrefine-batch-docker.sh](https://github.com/opencultureconsulting/openrefine-batch/blob/master/openrefine-batch-docker.sh)
|
||||
- openrefineder: [postBuild](https://github.com/felixlohmeier/openrefineder/blob/master/binder/postBuild)
|
||||
|
||||
## Credits
|
||||
|
||||
[Paul Makepeace](http://paulm.com), author
|
||||
|
||||
David Huynh, [initial cut](<http://markmail.org/message/jsxzlcu3gn6drtb7)
|
||||
|
||||
[Artfinder](http://www.artfinder.com), inspiration
|
||||
|
||||
[Felix Lohmeier](https://felixlohmeier.de), extended the CLI features
|
||||
|
||||
Some data used in the test suite has been used from publicly available sources:
|
||||
|
||||
- louisiana-elected-officials.csv: from http://www.sos.louisiana.gov/tabid/136/Default.aspx
|
||||
|
||||
- us_economic_assistance.csv: ["The Green Book"](http://www.data.gov/raw/1554)
|
||||
|
||||
- eli-lilly.csv: [ProPublica's "Docs for Dollars](http://projects.propublica.org/docdollars) leading to a [Lilly Faculty PDF](http://www.lillyfacultyregistry.com/documents/EliLillyFacultyRegistryQ22010.pdf) processed by [David Huynh's ScraperWiki script](http://scraperwiki.com/scrapers/eli-lilly-dollars-for-docs-scraper/edit/)
|
121
README.rst
121
README.rst
@ -1,121 +0,0 @@
|
||||
===================================
|
||||
Google Refine Python Client Library
|
||||
===================================
|
||||
|
||||
The Google Refine Python Client Library provides an interface to
|
||||
communicating with a Google Refine server.
|
||||
|
||||
Currently, the following API is supported:
|
||||
|
||||
- project creation/import, deletion, export
|
||||
- facet computation
|
||||
|
||||
- text
|
||||
- text filter
|
||||
- numeric
|
||||
- blank
|
||||
- starred & flagged
|
||||
- ... extensible class
|
||||
|
||||
- 'engine': managing multiple facets and their computation results
|
||||
- sorting & reordering
|
||||
- clustering
|
||||
- transforms
|
||||
- transposes
|
||||
- single and mass edits
|
||||
- annotation (star/flag)
|
||||
- column
|
||||
|
||||
- move
|
||||
- add
|
||||
- split
|
||||
- rename
|
||||
- reorder
|
||||
- remove
|
||||
|
||||
- reconciliation
|
||||
|
||||
- reconciliation judgment facet
|
||||
- guessing column type
|
||||
- querying reconciliation services preferences
|
||||
- perform reconciliation
|
||||
|
||||
Configuration
|
||||
=============
|
||||
|
||||
By default the Google Refine server URL is http://127.0.0.1:3333
|
||||
The environment variables ``GOOGLE_REFINE_HOST`` and ``GOOGLE_REFINE_PORT``
|
||||
enable overriding the host & port.
|
||||
|
||||
In order to run all tests, a live Refine server is needed. No existing projects
|
||||
are affected.
|
||||
|
||||
Installation
|
||||
============
|
||||
|
||||
(Someone with more familiarity with python's byzantine collection of installation
|
||||
frameworks is very welcome to improve/"best practice" all this.)
|
||||
|
||||
#. Install dependencies, which currently is ``urllib2_file``:
|
||||
|
||||
``sudo pip install -r requirements.txt``
|
||||
|
||||
#. Ensure you have a Refine server running somewhere and, if necessary, set
|
||||
the envvars as above.
|
||||
|
||||
#. Run tests, build, and install:
|
||||
|
||||
``python setup.py test # to do a subset, e.g., --test-suite tests.test_facet``
|
||||
|
||||
``python setup.py build``
|
||||
|
||||
``python setup.py install``
|
||||
|
||||
There is a Makefile that will do this too, and more.
|
||||
|
||||
TODO
|
||||
====
|
||||
|
||||
The API so far has been filled out from building a test suite to carry out the
|
||||
actions in `David Huynh's Refine tutorial <http://davidhuynh.net/spaces/nicar2011/tutorial.pdf>`_ which while certainly showing off a
|
||||
wide range of Refine features doesn't cover the entire suite. Notable exceptions
|
||||
currently include:
|
||||
|
||||
- reconciliation support is useful but not complete
|
||||
- undo/redo
|
||||
- Freebase
|
||||
- join columns
|
||||
- columns from URL
|
||||
|
||||
Contribute
|
||||
============
|
||||
|
||||
Patches welcome! Source is at https://github.com/PaulMakepeace/refine-client-py
|
||||
|
||||
Useful Tools
|
||||
------------
|
||||
|
||||
One aspect of development is watching HTTP transactions. To that end, I found
|
||||
`Fiddler <http://www.fiddler2.com/>`_ on Windows and `HTTPScoop
|
||||
<http://www.tuffcode.com/>`_ invaluable. The latter won't URL-decode nor nicely
|
||||
format JSON but the `Online JavaScript Beautifier <http://jsbeautifier.org/>`_
|
||||
will.
|
||||
|
||||
Credits
|
||||
=======
|
||||
|
||||
Paul Makepeace, author, <paulm@paulm.com>
|
||||
|
||||
David Huynh, `initial cut <http://groups.google.com/group/google-refine/msg/ee29cf8d660e66a9>`_
|
||||
|
||||
`Artfinder <http://www.artfinder.com/>`_, inspiration
|
||||
|
||||
Some data used in the test suite has been used from publicly available sources,
|
||||
|
||||
- louisiana-elected-officials.csv: from
|
||||
http://www.sos.louisiana.gov/tabid/136/Default.aspx
|
||||
|
||||
- us_economic_assistance.csv: `"The Green Book" <http://www.data.gov/raw/1554>`_
|
||||
|
||||
- eli-lilly.csv: `ProPublica's "Docs for Dollars" <http://projects.propublica.org/docdollars/>`_ leading to a `Lilly Faculty PDF <http://www.lillyfacultyregistry.com/documents/EliLillyFacultyRegistryQ22010.pdf>`_ processed by `David Huynh's ScraperWiki script <http://scraperwiki.com/scrapers/eli-lilly-dollars-for-docs-scraper/edit/>`_
|
||||
|
28
docker/Dockerfile
Normal file
28
docker/Dockerfile
Normal file
@ -0,0 +1,28 @@
|
||||
FROM alpine:3.11
|
||||
LABEL maintainer="felixlohmeier@opencultureconsulting.com"
|
||||
# The OpenRefine Python Client Library from PaulMakepeace provides an interface to communicating with an OpenRefine server. This fork extends the command line interface (CLI) and supports communication between docker containers.
|
||||
# Source: https://github.com/opencultureconsulting/openrefine-client
|
||||
|
||||
# Install python and pip
|
||||
# ... and curl for https://github.com/opencultureconsulting/openrefine-batch
|
||||
RUN apk add --no-cache \
|
||||
python \
|
||||
py-pip \
|
||||
curl
|
||||
|
||||
# Install dependency urllib2_file
|
||||
RUN pip install urllib2_file==0.2.1
|
||||
|
||||
# Copy python scripts
|
||||
WORKDIR /app
|
||||
COPY google google
|
||||
COPY refine.py .
|
||||
|
||||
# Change docker WORKDIR (shall be mounted by user)
|
||||
WORKDIR /data
|
||||
|
||||
# Execute refine.py
|
||||
ENTRYPOINT ["/app/refine.py"]
|
||||
|
||||
# Default command: print help
|
||||
CMD ["-h"]
|
279
google/refine/__main__.py
Normal file
279
google/refine/__main__.py
Normal file
@ -0,0 +1,279 @@
|
||||
#! /usr/bin/env python
|
||||
"""
|
||||
Script to provide a command line interface to a Refine server.
|
||||
"""
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
|
||||
import optparse
|
||||
|
||||
from google.refine import refine
|
||||
from google.refine import cli
|
||||
|
||||
|
||||
class myParser(optparse.OptionParser):
|
||||
|
||||
def format_epilog(self, formatter):
|
||||
return self.epilog
|
||||
|
||||
|
||||
PARSER = \
|
||||
myParser(description=('Script to provide a command line interface to an '
|
||||
'OpenRefine server.'),
|
||||
usage='usage: %prog [--help | OPTIONS]',
|
||||
epilog="""
|
||||
Example data:
|
||||
--download "https://git.io/fj5hF" --output=duplicates.csv
|
||||
--download "https://git.io/fj5ju" --output=duplicates-deletion.json
|
||||
|
||||
Basic commands:
|
||||
--list # list all projects
|
||||
--list -H 127.0.0.1 -P 80 # specify hostname and port
|
||||
--create duplicates.csv # create new project from file
|
||||
--info "duplicates" # show project metadata
|
||||
--apply duplicates-deletion.json "duplicates" # apply rules in file to project
|
||||
--export "duplicates" # export project to terminal in tsv format
|
||||
--export --output=deduped.xls "duplicates" # export project to file in xls format
|
||||
--delete "duplicates" # delete project
|
||||
|
||||
Some more examples:
|
||||
--info 1234567890123 # specify project by id
|
||||
--create example.tsv --encoding=UTF-8
|
||||
--create example.xml --recordPath=collection --recordPath=record
|
||||
--create example.json --recordPath=_ --recordPath=_
|
||||
--create example.xlsx --sheets=0
|
||||
--create example.ods --sheets=0
|
||||
|
||||
Example for Templating Export:
|
||||
Cf. https://github.com/opencultureconsulting/openrefine-client#advanced-templating
|
||||
""")
|
||||
|
||||
group1 = optparse.OptionGroup(PARSER, 'Connection options')
|
||||
group1.add_option('-H', '--host', dest='host',
|
||||
metavar='127.0.0.1',
|
||||
help='OpenRefine hostname (default: 127.0.0.1)')
|
||||
group1.add_option('-P', '--port', dest='port',
|
||||
metavar='3333',
|
||||
help='OpenRefine port (default: 3333)')
|
||||
PARSER.add_option_group(group1)
|
||||
|
||||
group2 = optparse.OptionGroup(PARSER, 'Commands')
|
||||
group2.add_option('-c', '--create', dest='create',
|
||||
metavar='[FILE]',
|
||||
help='Create project from file. The filename ending (e.g. .csv) defines the input format (csv,tsv,xml,json,txt,xls,xlsx,ods)')
|
||||
group2.add_option('-l', '--list', dest='list',
|
||||
action='store_true',
|
||||
help='List projects')
|
||||
group2.add_option('--download', dest='download',
|
||||
metavar='[URL]',
|
||||
help='Download file from URL (e.g. example data). Combine with --output to specify a filename.')
|
||||
PARSER.add_option_group(group2)
|
||||
|
||||
group3 = optparse.OptionGroup(
|
||||
PARSER, 'Commands with argument [PROJECTID/PROJECTNAME]')
|
||||
group3.add_option('-d', '--delete', dest='delete',
|
||||
action='store_true',
|
||||
help='Delete project')
|
||||
group3.add_option('-f', '--apply', dest='apply',
|
||||
metavar='[FILE]',
|
||||
help='Apply JSON rules to OpenRefine project')
|
||||
group3.add_option('-E', '--export', dest='export',
|
||||
action='store_true',
|
||||
help='Export project in tsv format to stdout.')
|
||||
group3.add_option('-o', '--output', dest='output',
|
||||
metavar='[FILE]',
|
||||
help='Export project to file. The filename ending (e.g. .tsv) defines the output format (csv,tsv,xls,xlsx,html)')
|
||||
group3.add_option('--template', dest='template',
|
||||
metavar='[STRING]',
|
||||
help='Export project with templating. Provide (big) text string that you enter in the *row template* textfield in the export/templating menu in the browser app)')
|
||||
group3.add_option('--info', dest='info',
|
||||
action='store_true',
|
||||
help='show project metadata')
|
||||
PARSER.add_option_group(group3)
|
||||
|
||||
group4 = optparse.OptionGroup(PARSER, 'General options')
|
||||
group4.add_option('--format', dest='file_format',
|
||||
help='Override file detection (import: csv,tsv,xml,json,line-based,fixed-width,xls,xlsx,ods; export: csv,tsv,html,xls,xlsx,ods)')
|
||||
PARSER.add_option_group(group4)
|
||||
|
||||
group5 = optparse.OptionGroup(PARSER, 'Create options')
|
||||
group5.add_option('--columnWidths', dest='columnWidths',
|
||||
action='append',
|
||||
type='int',
|
||||
help='(txt/fixed-width), please provide widths in multiple arguments, e.g. --columnWidths=7 --columnWidths=5')
|
||||
group5.add_option('--encoding', dest='encoding',
|
||||
help='(csv,tsv,txt), please provide short encoding name (e.g. UTF-8)')
|
||||
group5.add_option('--guessCellValueTypes', dest='guessCellValueTypes',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(xml,csv,tsv,txt,json, default: false)')
|
||||
group5.add_option('--headerLines', dest='headerLines',
|
||||
type="int",
|
||||
help='(csv,tsv,txt/fixed-width,xls,xlsx,ods), default: 1, default txt/fixed-width: 0')
|
||||
group5.add_option('--ignoreLines', dest='ignoreLines',
|
||||
type="int",
|
||||
help='(csv,tsv,txt,xls,xlsx,ods), default: -1')
|
||||
group5.add_option('--includeFileSources', dest='includeFileSources',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(all formats), default: false')
|
||||
group5.add_option('--limit', dest='limit',
|
||||
type="int",
|
||||
help='(all formats), default: -1')
|
||||
group5.add_option('--linesPerRow', dest='linesPerRow',
|
||||
type="int",
|
||||
help='(txt/line-based), default: 1')
|
||||
group5.add_option('--processQuotes', dest='processQuotes',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(csv,tsv), default: true')
|
||||
group5.add_option('--projectName', dest='projectName',
|
||||
help='(all formats), default: filename')
|
||||
group5.add_option('--projectTags', dest='projectTags',
|
||||
action='append',
|
||||
help='(all formats), please provide tags in multiple arguments, e.g. --projectTags=beta --projectTags=client1')
|
||||
group5.add_option('--recordPath', dest='recordPath',
|
||||
action='append',
|
||||
help='(xml,json), please provide path in multiple arguments, e.g. /collection/record/ should be entered: --recordPath=collection --recordPath=record, default xml: root element, default json: _ _')
|
||||
group5.add_option('--separator', dest='separator',
|
||||
help='(csv,tsv), default csv: , default tsv: \\t')
|
||||
group5.add_option('--sheets', dest='sheets',
|
||||
action='append',
|
||||
type="int",
|
||||
help='(xls,xlsx,ods), please provide sheets in multiple arguments, e.g. --sheets=0 --sheets=1, default: 0 (first sheet)')
|
||||
group5.add_option('--skipDataLines', dest='skipDataLines',
|
||||
type="int",
|
||||
help='(csv,tsv,txt,xls,xlsx,ods), default: 0, default line-based: -1')
|
||||
group5.add_option('--storeBlankCellsAsNulls', dest='storeBlankCellsAsNulls',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(csv,tsv,txt,xls,xlsx,ods), default: true')
|
||||
group5.add_option('--storeBlankRows', dest='storeBlankRows',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(csv,tsv,txt,xls,xlsx,ods), default: true')
|
||||
group5.add_option('--storeEmptyStrings', dest='storeEmptyStrings',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(xml,json), default: true')
|
||||
group5.add_option('--trimStrings', dest='trimStrings',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='(xml,json), default: false')
|
||||
PARSER.add_option_group(group5)
|
||||
|
||||
group6 = optparse.OptionGroup(PARSER, 'Templating options')
|
||||
group6.add_option('--mode', dest='mode',
|
||||
metavar='row-based/record-based',
|
||||
choices=('row-based', 'record-based'),
|
||||
help='engine mode (default: row-based)')
|
||||
group6.add_option('--prefix', dest='prefix',
|
||||
help='text string that you enter in the *prefix* textfield in the browser app')
|
||||
group6.add_option('--rowSeparator', dest='rowSeparator',
|
||||
help='text string that you enter in the *row separator* textfield in the browser app')
|
||||
group6.add_option('--suffix', dest='suffix',
|
||||
help='text string that you enter in the *suffix* textfield in the browser app')
|
||||
group6.add_option('--filterQuery', dest='filterQuery',
|
||||
metavar='REGEX',
|
||||
help='Simple RegEx text filter on filterColumn, e.g. ^12015$'),
|
||||
group6.add_option('--filterColumn', dest='filterColumn',
|
||||
metavar='COLUMNNAME',
|
||||
help='column name for filterQuery (default: name of first column)')
|
||||
group6.add_option('--facets', dest='facets',
|
||||
help='facets config in json format (may be extracted with browser dev tools in browser app)')
|
||||
group6.add_option('--splitToFiles', dest='splitToFiles',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='will split each row/record into a single file; it specifies a presumably unique character series for splitting; --prefix and --suffix will be applied to all files; filename-prefix can be specified with --output (default: %Y%m%d)')
|
||||
group6.add_option('--suffixById', dest='suffixById',
|
||||
metavar='true/false', choices=('true', 'false'),
|
||||
help='enhancement option for --splitToFiles; will generate filename-suffix from values in key column')
|
||||
PARSER.add_option_group(group6)
|
||||
|
||||
|
||||
def main():
|
||||
"""Command line interface."""
|
||||
|
||||
options, args = PARSER.parse_args()
|
||||
|
||||
# set environment
|
||||
if options.host:
|
||||
refine.REFINE_HOST = options.host
|
||||
if options.port:
|
||||
refine.REFINE_PORT = options.port
|
||||
|
||||
# get project_id
|
||||
if args and not str.isdigit(args[0]):
|
||||
projects = refine.Refine(refine.RefineServer()).list_projects().items()
|
||||
idlist = []
|
||||
for project_id, project_info in projects:
|
||||
if args[0].decode('UTF-8') == project_info['name']:
|
||||
idlist.append(str(project_id))
|
||||
if len(idlist) > 1:
|
||||
print('Error: Found %s projects with name %s.\n'
|
||||
'Please specify project by id.' % (len(idlist), args[0]))
|
||||
for i in idlist:
|
||||
print('')
|
||||
cli.info(i)
|
||||
return
|
||||
else:
|
||||
try:
|
||||
project_id = idlist[0]
|
||||
except IndexError:
|
||||
print('Error: No project found with name %s.\n'
|
||||
'Try command --list' % args[0])
|
||||
return
|
||||
elif args:
|
||||
project_id = args[0]
|
||||
|
||||
# commands without args
|
||||
if options.list:
|
||||
cli.ls()
|
||||
elif options.download:
|
||||
cli.download(options.download, output_file=options.output)
|
||||
elif options.create:
|
||||
group5_dict = {group5_arg.dest: getattr(options, group5_arg.dest)
|
||||
for group5_arg in group5.option_list}
|
||||
kwargs = {k: v for k, v in group5_dict.items()
|
||||
if v is not None and v not in ['true', 'false']}
|
||||
kwargs.update({k: True for k, v in group5_dict.items()
|
||||
if v == 'true'})
|
||||
kwargs.update({k: False for k, v in group5_dict.items()
|
||||
if v == 'false'})
|
||||
if options.file_format:
|
||||
kwargs.update({'project_format': options.file_format})
|
||||
cli.create(options.create, **kwargs)
|
||||
# commands with args
|
||||
elif args and options.info:
|
||||
cli.info(project_id)
|
||||
elif args and options.delete:
|
||||
cli.delete(project_id)
|
||||
elif args and options.apply:
|
||||
cli.apply(project_id, options.apply)
|
||||
elif args and options.template:
|
||||
group6_dict = {group6_arg.dest: getattr(options, group6_arg.dest)
|
||||
for group6_arg in group6.option_list}
|
||||
kwargs = {k: v for k, v in group6_dict.items()
|
||||
if v is not None and v not in ['true', 'false']}
|
||||
kwargs.update({k: True for k, v in group6_dict.items()
|
||||
if v == 'true'})
|
||||
kwargs.update({k: False for k, v in group6_dict.items()
|
||||
if v == 'false'})
|
||||
cli.templating(project_id, options.template,
|
||||
output_file=options.output, **kwargs)
|
||||
elif args and (options.export or options.output):
|
||||
cli.export(project_id, output_file=options.output,
|
||||
export_format=options.file_format)
|
||||
else:
|
||||
PARSER.print_usage()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# execute only if run as a script
|
||||
main()
|
335
google/refine/cli.py
Normal file
335
google/refine/cli.py
Normal file
@ -0,0 +1,335 @@
|
||||
#! /usr/bin/env python
|
||||
"""
|
||||
Functions used by the command line interface (CLI)
|
||||
"""
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
|
||||
import json
|
||||
import os
|
||||
import ssl
|
||||
import sys
|
||||
import time
|
||||
import urllib
|
||||
from xml.etree import ElementTree
|
||||
|
||||
from google.refine import refine
|
||||
|
||||
|
||||
def apply(project_id, history_file):
|
||||
"""Apply OpenRefine history from json file to project."""
|
||||
project = refine.RefineProject(project_id)
|
||||
response = project.apply_operations(history_file)
|
||||
if response != 'ok':
|
||||
raise Exception('Failed to apply %s to %s: %s' %
|
||||
(history_file, project_id, response))
|
||||
else:
|
||||
print('File %s has been successfully applied to project %s' %
|
||||
(history_file, project_id))
|
||||
|
||||
def create(project_file,
|
||||
project_format=None,
|
||||
columnWidths=None,
|
||||
encoding=None,
|
||||
guessCellValueTypes=False,
|
||||
headerLines=None,
|
||||
ignoreLines=None,
|
||||
includeFileSources=False,
|
||||
limit=None,
|
||||
linesPerRow=None,
|
||||
processQuotes=True,
|
||||
projectName=None,
|
||||
projectTags=None,
|
||||
recordPath=None,
|
||||
separator=None,
|
||||
sheets=None,
|
||||
skipDataLines=None,
|
||||
storeBlankCellsAsNulls=True,
|
||||
storeBlankRows=True,
|
||||
storeEmptyStrings=True,
|
||||
trimStrings=False
|
||||
):
|
||||
"""Create a new project from file."""
|
||||
# guess format from file extension
|
||||
if not project_format:
|
||||
project_format = os.path.splitext(project_file)[1][1:].lower()
|
||||
if project_format == 'txt':
|
||||
try:
|
||||
columnWidths[0]
|
||||
project_format = 'fixed-width'
|
||||
except TypeError:
|
||||
project_format = 'line-based'
|
||||
# defaults for each file type
|
||||
if project_format == 'xml':
|
||||
project_format = 'text/xml'
|
||||
if not recordPath:
|
||||
recordPath = [ElementTree.parse(project_file).getroot().tag]
|
||||
elif project_format == 'csv':
|
||||
project_format = 'text/line-based/*sv'
|
||||
elif project_format == 'tsv':
|
||||
project_format = 'text/line-based/*sv'
|
||||
if not separator:
|
||||
separator = '\t'
|
||||
elif project_format == 'line-based':
|
||||
project_format = 'text/line-based'
|
||||
if not skipDataLines:
|
||||
skipDataLines = -1
|
||||
elif project_format == 'fixed-width':
|
||||
project_format = 'text/line-based/fixed-width'
|
||||
if not headerLines:
|
||||
headerLines = 0
|
||||
elif project_format == 'json':
|
||||
project_format = 'text/json'
|
||||
if not recordPath:
|
||||
recordPath = ['_', '_']
|
||||
elif project_format == 'xls':
|
||||
project_format = 'binary/text/xml/xls/xlsx'
|
||||
if not sheets:
|
||||
sheets = [0]
|
||||
# TODO: new format for sheets option introduced in OpenRefine 2.8
|
||||
elif project_format == 'xlsx':
|
||||
project_format = 'binary/text/xml/xls/xlsx'
|
||||
if not sheets:
|
||||
sheets = [0]
|
||||
# TODO: new format for sheets option introduced in OpenRefine 2.8
|
||||
elif project_format == 'ods':
|
||||
project_format = 'text/xml/ods'
|
||||
if not sheets:
|
||||
sheets = [0]
|
||||
# TODO: new format for sheets option introduced in OpenRefine 2.8
|
||||
# execute
|
||||
kwargs = {k: v for k, v in vars().items() if v is not None}
|
||||
project = refine.Refine(refine.RefineServer()).new_project(
|
||||
guess_cell_value_types=guessCellValueTypes,
|
||||
ignore_lines=ignoreLines,
|
||||
header_lines=headerLines,
|
||||
skip_data_lines=skipDataLines,
|
||||
store_blank_rows=storeBlankRows,
|
||||
process_quotes=processQuotes,
|
||||
project_name=projectName,
|
||||
store_blank_cells_as_nulls=storeBlankCellsAsNulls,
|
||||
include_file_sources=includeFileSources,
|
||||
**kwargs)
|
||||
rows = project.do_json('get-rows')['total']
|
||||
if rows > 0:
|
||||
print('{0}: {1}'.format('id', project.project_id))
|
||||
print('{0}: {1}'.format('rows', rows))
|
||||
return project
|
||||
else:
|
||||
raise Exception(
|
||||
'Project contains 0 rows. Please check --help for mandatory '
|
||||
'arguments for xml, json, xlsx and ods')
|
||||
|
||||
|
||||
def delete(project_id):
|
||||
"""Delete project."""
|
||||
project = refine.RefineProject(project_id)
|
||||
response = project.delete()
|
||||
if response != True:
|
||||
raise Exception('Failed to delete %s: %s' %
|
||||
(project_id, response))
|
||||
else:
|
||||
print('Project %s has been successfully deleted' % project_id)
|
||||
|
||||
|
||||
def download(url, output_file=None):
|
||||
"""Integrated download function for your convenience."""
|
||||
if not output_file:
|
||||
output_file = os.path.basename(url)
|
||||
if os.path.exists(output_file):
|
||||
print('Error: File %s already exists.\n'
|
||||
'Delete existing file or try command --output '
|
||||
'to specify a different filename.' % output_file)
|
||||
return
|
||||
# Workaround for SSL verification problems in one-file-executables
|
||||
context = ssl._create_unverified_context()
|
||||
urllib.urlretrieve(url, output_file, context=context)
|
||||
print('Download to file %s complete' % output_file)
|
||||
|
||||
|
||||
def export(project_id, encoding=None, output_file=None, export_format=None):
|
||||
"""Dump a project to stdout or file."""
|
||||
project = refine.RefineProject(project_id)
|
||||
if not output_file:
|
||||
if not export_format:
|
||||
export_format = 'tsv'
|
||||
if export_format in ['csv', 'tsv', 'txt']:
|
||||
encoding = 'UTF-8'
|
||||
sys.stdout.write(project.export(
|
||||
export_format=export_format, encoding=encoding).read())
|
||||
else:
|
||||
ext = os.path.splitext(output_file)[1][1:]
|
||||
if ext and not export_format:
|
||||
export_format = ext.lower()
|
||||
if not export_format:
|
||||
export_format = 'tsv'
|
||||
if export_format in ['csv', 'tsv', 'txt']:
|
||||
encoding = 'UTF-8'
|
||||
with open(output_file, 'wb') as f:
|
||||
f.write(project.export(
|
||||
export_format=export_format, encoding=encoding).read())
|
||||
print('Export to file %s complete' % output_file)
|
||||
|
||||
|
||||
def info(project_id):
|
||||
"""Show project metadata"""
|
||||
projects = refine.Refine(refine.RefineServer()).list_projects()
|
||||
if project_id in projects.keys():
|
||||
print('{0:>20}: {1}'.format('id', project_id))
|
||||
print('{0:>20}: {1}'.format('url', 'http://' +
|
||||
refine.REFINE_HOST + ':' +
|
||||
refine.REFINE_PORT +
|
||||
'/project?project=' + project_id))
|
||||
for k, v in projects[project_id].items():
|
||||
if v:
|
||||
print(u'{0:>20}: {1}'.format(k, v))
|
||||
project_model = refine.RefineProject(project_id).get_models()
|
||||
columns = [c['name'] for c in project_model['columnModel']['columns']]
|
||||
for (i, v) in enumerate(columns, start=1):
|
||||
print(u'{0:>20}: {1}'.format(u'column ' + str(i).zfill(3), v).encode('utf-8'))
|
||||
else:
|
||||
print('Error: No project found with id %s.\n'
|
||||
'Check existing projects with command --list' % (project_id))
|
||||
|
||||
|
||||
def ls():
|
||||
"""Query the server and list projects sorted by mtime."""
|
||||
projects = refine.Refine(refine.RefineServer()).list_projects().items()
|
||||
|
||||
def date_to_epoch(json_dt):
|
||||
"""Convert a JSON date time into seconds-since-epoch."""
|
||||
return time.mktime(time.strptime(json_dt, '%Y-%m-%dT%H:%M:%SZ'))
|
||||
projects.sort(key=lambda v: date_to_epoch(v[1]['modified']), reverse=True)
|
||||
if projects:
|
||||
for project_id, project_info in projects:
|
||||
print(u'{0:>14}: {1}'.format(project_id, project_info['name']).encode('utf-8'))
|
||||
else:
|
||||
print('Error: No projects found')
|
||||
|
||||
|
||||
def templating(project_id,
|
||||
template,
|
||||
encoding='UTF-8',
|
||||
output_file=None,
|
||||
mode=None,
|
||||
prefix='',
|
||||
rowSeparator='\n',
|
||||
suffix='',
|
||||
filterQuery=None,
|
||||
filterColumn=None,
|
||||
facets=None,
|
||||
splitToFiles=False,
|
||||
suffixById=None
|
||||
):
|
||||
"""Dump a project to stdout or file with templating."""
|
||||
project = refine.RefineProject(project_id)
|
||||
|
||||
# basic config
|
||||
templateconfig = {'prefix': prefix,
|
||||
'suffix': suffix,
|
||||
'template': template,
|
||||
'rowSeparator': rowSeparator,
|
||||
'encoding': encoding}
|
||||
|
||||
# construct the engine config
|
||||
if mode == 'record-based':
|
||||
engine = {'facets': [], 'mode': 'record-based'}
|
||||
else:
|
||||
engine = {'facets': [], 'mode': 'row-based'}
|
||||
if facets:
|
||||
engine['facets'].append(json.loads(facets))
|
||||
if filterQuery:
|
||||
if not filterColumn:
|
||||
filterColumn = project.get_models()['columnModel']['keyColumnName']
|
||||
textFilter = {'type': 'text',
|
||||
'name': filterColumn,
|
||||
'columnName': filterColumn,
|
||||
'mode': 'regex',
|
||||
'caseSensitive': False,
|
||||
'query': filterQuery}
|
||||
engine['facets'].append(textFilter)
|
||||
templateconfig.update({'engine': json.dumps(engine)})
|
||||
|
||||
if not splitToFiles:
|
||||
# normal output
|
||||
if not output_file:
|
||||
sys.stdout.write(project.export_templating(
|
||||
**templateconfig).read())
|
||||
else:
|
||||
with open(output_file, 'wb') as f:
|
||||
f.write(project.export_templating(**templateconfig).read())
|
||||
print('Export to file %s complete' % output_file)
|
||||
else:
|
||||
# splitToFiles functionality
|
||||
prefix = templateconfig['prefix']
|
||||
suffix = templateconfig['suffix']
|
||||
split = '===|||THISISTHEBEGINNINGOFANEWRECORD|||==='
|
||||
if not output_file:
|
||||
output_file = time.strftime('%Y%m%d')
|
||||
else:
|
||||
base = os.path.splitext(output_file)[0]
|
||||
ext = os.path.splitext(output_file)[1][1:]
|
||||
if not ext:
|
||||
ext = 'txt'
|
||||
# generate config for subfeature suffixById
|
||||
if suffixById:
|
||||
ids_template = ('{{forNonBlank(' +
|
||||
'with(row.columnNames[0],cn,cells[cn].value),' +
|
||||
'v,v,"")}}')
|
||||
ids_templateconfig = {'engine': json.dumps(engine),
|
||||
'template': ids_template,
|
||||
'rowSeparator': '\n',
|
||||
'encoding': encoding}
|
||||
ids = [line.rstrip('\n') for line in project.export_templating(
|
||||
**ids_templateconfig) if line.rstrip('\n')]
|
||||
# generate common config
|
||||
if mode == 'record-based':
|
||||
# record-based: split-character into template
|
||||
# if key column is not blank (=record)
|
||||
template = ('{{forNonBlank(' +
|
||||
'with(row.columnNames[0],cn,cells[cn].value),' +
|
||||
'v,"' + split + '", "")}}' +
|
||||
templateconfig['template'])
|
||||
templateconfig.update({'prefix': '',
|
||||
'suffix': '',
|
||||
'template': template,
|
||||
'rowSeparator': ''})
|
||||
else:
|
||||
# row-based: split-character into template
|
||||
template = split + templateconfig['template']
|
||||
templateconfig.update({'prefix': '',
|
||||
'suffix': '',
|
||||
'template': template,
|
||||
'rowSeparator': ''})
|
||||
# execute
|
||||
records = project.export_templating(
|
||||
**templateconfig).read().split(split)
|
||||
del records[0] # skip first blank entry
|
||||
if suffixById:
|
||||
for index, record in enumerate(records):
|
||||
output_file = base + '_' + ids[index] + '.' + ext
|
||||
with open(output_file, 'wb') as f:
|
||||
f.writelines([prefix, record, suffix])
|
||||
print('Export to files complete. Last file: %s' % output_file)
|
||||
else:
|
||||
zeros = len(str(len(records)))
|
||||
for index, record in enumerate(records):
|
||||
output_file = base + '_' + \
|
||||
str(index + 1).zfill(zeros) + '.' + ext
|
||||
with open(output_file, 'wb') as f:
|
||||
f.writelines([prefix, record, suffix])
|
||||
print('Export to files complete. Last file: %s' % output_file)
|
@ -1,6 +1,6 @@
|
||||
#!/usr/bin/env python
|
||||
"""
|
||||
Google Refine Facets, Engine, and Facet Responses.
|
||||
OpenRefine Facets, Engine, and Facet Responses.
|
||||
"""
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
@ -28,6 +28,7 @@ def to_camel(attr):
|
||||
return (attr[0].lower() +
|
||||
re.sub(r'_(.)', lambda x: x.group(1).upper(), attr[1:]))
|
||||
|
||||
|
||||
def from_camel(attr):
|
||||
"""convert thisAttrName to this_attr_name."""
|
||||
# Don't add an underscore for capitalized first letter
|
||||
@ -35,8 +36,8 @@ def from_camel(attr):
|
||||
|
||||
|
||||
class Facet(object):
|
||||
def __init__(self, column, type, **options):
|
||||
self.type = type
|
||||
def __init__(self, column, facet_type, **options):
|
||||
self.type = facet_type
|
||||
self.name = column
|
||||
self.column_name = column
|
||||
for k, v in options.items():
|
||||
@ -50,17 +51,17 @@ class Facet(object):
|
||||
class TextFilterFacet(Facet):
|
||||
def __init__(self, column, query, **options):
|
||||
super(TextFilterFacet, self).__init__(
|
||||
column, query=query, case_sensitive=False, type='text',
|
||||
column, query=query, case_sensitive=False, facet_type='text',
|
||||
mode='text', **options)
|
||||
|
||||
|
||||
class TextFacet(Facet):
|
||||
def __init__(self, column, selection=None, expression='value',
|
||||
omit_blank=False, omit_error=False, select_blank=False,
|
||||
select_error=False, invert=False, **options):
|
||||
omit_blank=False, omit_error=False, select_blank=False,
|
||||
select_error=False, invert=False, **options):
|
||||
super(TextFacet, self).__init__(
|
||||
column,
|
||||
type='list',
|
||||
facet_type='list',
|
||||
omit_blank=omit_blank,
|
||||
omit_error=omit_error,
|
||||
select_blank=select_blank,
|
||||
@ -99,37 +100,39 @@ class BoolFacet(TextFacet):
|
||||
raise ValueError('selection must be True or False.')
|
||||
if expression is None:
|
||||
raise ValueError('Missing expression')
|
||||
super(BoolFacet, self).__init__(column,
|
||||
expression=expression, selection=selection)
|
||||
super(BoolFacet, self).__init__(
|
||||
column, expression=expression, selection=selection)
|
||||
|
||||
|
||||
class StarredFacet(BoolFacet):
|
||||
def __init__(self, selection=None):
|
||||
super(StarredFacet, self).__init__('',
|
||||
expression='row.starred', selection=selection)
|
||||
super(StarredFacet, self).__init__(
|
||||
'', expression='row.starred', selection=selection)
|
||||
|
||||
|
||||
class FlaggedFacet(BoolFacet):
|
||||
def __init__(self, selection=None):
|
||||
super(FlaggedFacet, self).__init__('',
|
||||
expression='row.flagged', selection=selection)
|
||||
super(FlaggedFacet, self).__init__(
|
||||
'', expression='row.flagged', selection=selection)
|
||||
|
||||
|
||||
class BlankFacet(BoolFacet):
|
||||
def __init__(self, column, selection=None):
|
||||
super(BlankFacet, self).__init__(column,
|
||||
expression='isBlank(value)', selection=selection)
|
||||
super(BlankFacet, self).__init__(
|
||||
column, expression='isBlank(value)', selection=selection)
|
||||
|
||||
|
||||
class ReconJudgmentFacet(TextFacet):
|
||||
def __init__(self, column, **options):
|
||||
super(ReconJudgmentFacet, self).__init__(column,
|
||||
super(ReconJudgmentFacet, self).__init__(
|
||||
column,
|
||||
expression=('forNonBlank(cell.recon.judgment, v, v, '
|
||||
'if(isNonBlank(value), "(unreconciled)", "(blank)"))'),
|
||||
**options)
|
||||
|
||||
|
||||
# Capitalize 'From' to get around python's reserved word.
|
||||
#noinspection PyPep8Naming
|
||||
class NumericFacet(Facet):
|
||||
def __init__(self, column, From=None, to=None, expression='value',
|
||||
select_blank=True, select_error=True, select_non_numeric=True,
|
||||
@ -139,7 +142,7 @@ class NumericFacet(Facet):
|
||||
From=From,
|
||||
to=to,
|
||||
expression=expression,
|
||||
type='range',
|
||||
facet_type='range',
|
||||
select_blank=select_blank,
|
||||
select_error=select_error,
|
||||
select_non_numeric=select_non_numeric,
|
||||
@ -155,10 +158,12 @@ class NumericFacet(Facet):
|
||||
class FacetResponse(object):
|
||||
"""Class for unpacking an individual facet response."""
|
||||
def __init__(self, facet):
|
||||
self.name = None
|
||||
for k, v in facet.items():
|
||||
if isinstance(k, bool) or isinstance(k, basestring):
|
||||
setattr(self, from_camel(k), v)
|
||||
self.choices = {}
|
||||
|
||||
class FacetChoice(object):
|
||||
def __init__(self, c):
|
||||
self.count = c['c']
|
||||
@ -188,11 +193,14 @@ class FacetsResponse(object):
|
||||
def __init__(self, engine, facets):
|
||||
class FacetResponseContainer(object):
|
||||
facets = None
|
||||
|
||||
def __init__(self, facet_responses):
|
||||
self.facets = [FacetResponse(fr) for fr in facet_responses]
|
||||
|
||||
def __iter__(self):
|
||||
for facet in self.facets:
|
||||
yield facet
|
||||
|
||||
def __getitem__(self, index):
|
||||
if not isinstance(index, int):
|
||||
index = engine.facet_index_by_id[id(index)]
|
||||
@ -205,10 +213,10 @@ class FacetsResponse(object):
|
||||
|
||||
class Engine(object):
|
||||
"""An Engine keeps track of Facets, and responses to facet computation."""
|
||||
facets = []
|
||||
facet_index_by_id = {} # dict of facets by Facet object id
|
||||
|
||||
def __init__(self, *facets, **kwargs):
|
||||
self.facets = []
|
||||
self.facet_index_by_id = {} # dict of facets by Facet object id
|
||||
self.set_facets(*facets)
|
||||
self.mode = kwargs.get('mode', 'row-based')
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
#!/usr/bin/env python
|
||||
"""
|
||||
Google Refine history: parsing responses.
|
||||
OpenRefine history: parsing responses.
|
||||
"""
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
@ -18,15 +18,13 @@ Google Refine history: parsing responses.
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
|
||||
class HistoryEntry(object):
|
||||
# N.B. e.g. **response['historyEntry'] won't work as keys are unicode :-/
|
||||
def __init__(self, id=None, time=None, description=None, **kwargs):
|
||||
if id is None:
|
||||
#noinspection PyUnusedLocal
|
||||
def __init__(self, history_entry_id=None, time=None, description=None, **kwargs):
|
||||
if history_entry_id is None:
|
||||
raise ValueError('History entry id must be set')
|
||||
self.id = id
|
||||
self.id = history_entry_id
|
||||
self.description = description
|
||||
self.time = time
|
||||
|
@ -33,8 +33,8 @@ import urlparse
|
||||
from google.refine import facet
|
||||
from google.refine import history
|
||||
|
||||
REFINE_HOST = os.environ.get('GOOGLE_REFINE_HOST', '127.0.0.1')
|
||||
REFINE_PORT = os.environ.get('GOOGLE_REFINE_PORT', '3333')
|
||||
REFINE_HOST = os.environ.get('OPENREFINE_HOST', os.environ.get('GOOGLE_REFINE_HOST', '127.0.0.1'))
|
||||
REFINE_PORT = os.environ.get('OPENREFINE_PORT', os.environ.get('GOOGLE_REFINE_PORT', '3333'))
|
||||
|
||||
|
||||
class RefineServer(object):
|
||||
@ -50,9 +50,21 @@ class RefineServer(object):
|
||||
|
||||
def __init__(self, server=None):
|
||||
if server is None:
|
||||
server=self.url()
|
||||
server = self.url()
|
||||
self.server = server[:-1] if server.endswith('/') else server
|
||||
self.__version = None # see version @property below
|
||||
self.token = None # CSRF token introduced in OpenRefine 3.3
|
||||
self.get_csrf_token()
|
||||
|
||||
def get_csrf_token(self):
|
||||
"""Return csrf token."""
|
||||
try:
|
||||
url = self.server + '/command/core/get-csrf-token'
|
||||
response = json.loads(urllib2.urlopen(url).read())
|
||||
self.token = response['token']
|
||||
return self.token
|
||||
except:
|
||||
pass # fail silently to not disturb usage of OpenRefine <3.3
|
||||
|
||||
def urlopen(self, command, data=None, params=None, project_id=None):
|
||||
"""Open a Refine URL and with optional query params and POST data.
|
||||
@ -73,18 +85,24 @@ class RefineServer(object):
|
||||
data['project'] = project_id
|
||||
else:
|
||||
params['project'] = project_id
|
||||
# be lazy and send the token for each API call (even when not needed)
|
||||
if self.token:
|
||||
params['csrf_token'] = self.token
|
||||
if params:
|
||||
url += '?' + urllib.urlencode(params)
|
||||
req = urllib2.Request(url)
|
||||
if data:
|
||||
req.add_data(data) # data = urllib.urlencode(data)
|
||||
req.add_data(data) # data = urllib.urlencode(data)
|
||||
#req.add_header('Accept-Encoding', 'gzip')
|
||||
try:
|
||||
response = urllib2.urlopen(req)
|
||||
except urllib2.URLError as (url_error,):
|
||||
except urllib2.HTTPError as e:
|
||||
raise Exception('HTTP %d "%s" for %s\n\t%s' %
|
||||
(e.code, e.msg, e.geturl(), data))
|
||||
except urllib2.URLError as e:
|
||||
raise urllib2.URLError(
|
||||
'%s for %s. No Refine server reachable/running; ENV set?' %
|
||||
(url_error, self.server))
|
||||
(e.reason, self.server))
|
||||
if response.info().get('Content-Encoding', None) == 'gzip':
|
||||
# Need a seekable filestream for gzip
|
||||
gzip_fp = gzip.GzipFile(fileobj=StringIO.StringIO(response.read()))
|
||||
@ -96,9 +114,13 @@ class RefineServer(object):
|
||||
"""Open a Refine URL, optionally POST data, and return parsed JSON."""
|
||||
response = json.loads(self.urlopen(*args, **kwargs).read())
|
||||
if 'code' in response and response['code'] not in ('ok', 'pending'):
|
||||
raise Exception(
|
||||
response['code'] + ': ' +
|
||||
response.get('message', response.get('stack', response)))
|
||||
if 'Missing or invalid csrf_token parameter' == response['message']:
|
||||
self.get_csrf_token()
|
||||
response = json.loads(self.urlopen(*args, **kwargs).read())
|
||||
return response
|
||||
error_message = ('server ' + response['code'] + ': ' +
|
||||
response.get('message', response.get('stack', response)))
|
||||
raise Exception(error_message)
|
||||
return response
|
||||
|
||||
def get_version(self):
|
||||
@ -114,6 +136,7 @@ class RefineServer(object):
|
||||
self.__version = self.get_version()['version']
|
||||
return self.__version
|
||||
|
||||
|
||||
class Refine:
|
||||
"""Class representing a connection to a Refine server."""
|
||||
def __init__(self, server):
|
||||
@ -144,35 +167,115 @@ class Refine:
|
||||
"""Open a Refine project."""
|
||||
return RefineProject(self.server, project_id)
|
||||
|
||||
def new_project(self, project_file=None, project_url=None,
|
||||
project_name=None,
|
||||
split_into_columns=True,
|
||||
separator='',
|
||||
ignore_initial_non_blank_lines=0,
|
||||
header_lines=1, # use 0 if your data has no header
|
||||
skip_initial_data_rows=0,
|
||||
limit=None, # no more than this number of rows
|
||||
guess_value_type=True, # numbers, dates, etc.
|
||||
ignore_quotes=False):
|
||||
# These aren't used yet but are included for reference
|
||||
new_project_defaults = {
|
||||
'text/line-based/*sv': {
|
||||
'encoding': '',
|
||||
'separator': ',',
|
||||
'ignore_lines': -1,
|
||||
'header_lines': 1,
|
||||
'skip_data_lines': 0,
|
||||
'limit': -1,
|
||||
'store_blank_rows': True,
|
||||
'guess_cell_value_types': True,
|
||||
'process_quotes': True,
|
||||
'store_blank_cells_as_nulls': True,
|
||||
'include_file_sources': False},
|
||||
'text/line-based': {
|
||||
'encoding': '',
|
||||
'lines_per_row': 1,
|
||||
'ignore_lines': -1,
|
||||
'limit': -1,
|
||||
'skip_data_lines': -1,
|
||||
'store_blank_rows': True,
|
||||
'store_blank_cells_as_nulls': True,
|
||||
'include_file_sources': False},
|
||||
'text/line-based/fixed-width': {
|
||||
'encoding': '',
|
||||
'column_widths': [20],
|
||||
'ignore_lines': -1,
|
||||
'header_lines': 0,
|
||||
'skip_data_lines': 0,
|
||||
'limit': -1,
|
||||
'guess_cell_value_types': False,
|
||||
'store_blank_rows': True,
|
||||
'store_blank_cells_as_nulls': True,
|
||||
'include_file_sources': False},
|
||||
'text/line-based/pc-axis': {
|
||||
'encoding': '',
|
||||
'limit': -1,
|
||||
'skip_data_lines': -1,
|
||||
'include_file_sources': False},
|
||||
'text/rdf+n3': {'encoding': ''},
|
||||
'text/xml/ods': {
|
||||
'sheets': [],
|
||||
'ignore_lines': -1,
|
||||
'header_lines': 1,
|
||||
'skip_data_lines': 0,
|
||||
'limit': -1,
|
||||
'store_blank_rows': True,
|
||||
'store_blank_cells_as_nulls': True,
|
||||
'include_file_sources': False},
|
||||
'binary/xls': {
|
||||
'xml_based': False,
|
||||
'sheets': [],
|
||||
'ignore_lines': -1,
|
||||
'header_lines': 1,
|
||||
'skip_data_lines': 0,
|
||||
'limit': -1,
|
||||
'store_blank_rows': True,
|
||||
'store_blank_cells_as_nulls': True,
|
||||
'include_file_sources': False}
|
||||
}
|
||||
|
||||
if ((project_file and project_url) or
|
||||
(not project_file and not project_url)):
|
||||
def new_project(self, project_file=None, project_url=None, project_name=None, project_format='text/line-based/*sv',
|
||||
encoding='',
|
||||
separator=',',
|
||||
ignore_lines=-1,
|
||||
header_lines=1,
|
||||
skip_data_lines=0,
|
||||
limit=-1,
|
||||
store_blank_rows=True,
|
||||
guess_cell_value_types=False,
|
||||
process_quotes=True,
|
||||
store_blank_cells_as_nulls=True,
|
||||
include_file_sources=False,
|
||||
**opts):
|
||||
|
||||
if (project_file and project_url) or (not project_file and not project_url):
|
||||
raise ValueError('One (only) of project_file and project_url must be set')
|
||||
|
||||
def s(opt):
|
||||
if isinstance(opt, bool):
|
||||
return 'on' if opt else ''
|
||||
return 'true' if opt else 'false'
|
||||
if opt is None:
|
||||
return ''
|
||||
return str(opt)
|
||||
options = {
|
||||
'split-into-columns': s(split_into_columns),
|
||||
'separator': s(separator),
|
||||
'ignore': s(ignore_initial_non_blank_lines),
|
||||
'header-lines': s(header_lines),
|
||||
'skip': s(skip_initial_data_rows), 'limit': s(limit),
|
||||
'guess-value-type': s(guess_value_type),
|
||||
'ignore-quotes': s(ignore_quotes),
|
||||
|
||||
# the new APIs requires a json in the 'option' POST or GET argument
|
||||
# POST is broken at the moment, so we send it in the URL
|
||||
new_style_options = dict(opts, **{
|
||||
'encoding': s(encoding),
|
||||
'separator': s(separator)
|
||||
})
|
||||
params = {
|
||||
'options': json.dumps(new_style_options),
|
||||
}
|
||||
|
||||
# old style options
|
||||
options = {
|
||||
'format': project_format,
|
||||
'ignore-lines': s(ignore_lines),
|
||||
'header-lines': s(header_lines),
|
||||
'skip-data-lines': s(skip_data_lines),
|
||||
'limit': s(limit),
|
||||
'guess-value-type': s(guess_cell_value_types),
|
||||
'process-quotes': s(process_quotes),
|
||||
'store-blank-rows': s(store_blank_rows),
|
||||
'store-blank-cells-as-nulls': s(store_blank_cells_as_nulls),
|
||||
'include-file-sources': s(include_file_sources)
|
||||
}
|
||||
|
||||
if project_url is not None:
|
||||
options['url'] = project_url
|
||||
elif project_file is not None:
|
||||
@ -185,7 +288,9 @@ class Refine:
|
||||
project_name = (project_file or 'New project').rsplit('.', 1)[0]
|
||||
project_name = os.path.basename(project_name)
|
||||
options['project-name'] = project_name
|
||||
response = self.server.urlopen('create-project-from-upload', options)
|
||||
response = self.server.urlopen(
|
||||
'create-project-from-upload', options, params
|
||||
)
|
||||
# expecting a redirect to the new project containing the id in the url
|
||||
url_params = urlparse.parse_qs(
|
||||
urlparse.urlparse(response.geturl()).query)
|
||||
@ -211,6 +316,7 @@ def RowsResponseFactory(column_index):
|
||||
self.index = row_response['i']
|
||||
self.row = [c['v'] if c else None
|
||||
for c in row_response['cells']]
|
||||
|
||||
def __getitem__(self, column):
|
||||
# Trailing nulls seem to be stripped from row data
|
||||
try:
|
||||
@ -220,11 +326,14 @@ def RowsResponseFactory(column_index):
|
||||
|
||||
def __init__(self, rows_response):
|
||||
self.rows_response = rows_response
|
||||
|
||||
def __iter__(self):
|
||||
for row_response in self.rows_response:
|
||||
yield self.RefineRow(row_response)
|
||||
|
||||
def __getitem__(self, index):
|
||||
return self.RefineRow(self.rows_response[index])
|
||||
|
||||
def __len__(self):
|
||||
return len(self.rows_response)
|
||||
|
||||
@ -240,7 +349,7 @@ def RowsResponseFactory(column_index):
|
||||
|
||||
|
||||
class RefineProject:
|
||||
"""A Google Refine project."""
|
||||
"""An OpenRefine project."""
|
||||
|
||||
def __init__(self, server, project_id=None):
|
||||
if not isinstance(server, RefineServer):
|
||||
@ -309,7 +418,10 @@ class RefineProject:
|
||||
for i, column in enumerate(column_model['columns']):
|
||||
name = column['name']
|
||||
self.column_order[name] = i
|
||||
column_index[name] = column['cellIndex']
|
||||
try:
|
||||
column_index[name] = column['cellIndex']
|
||||
except KeyError:
|
||||
column_index[name] = i
|
||||
self.key_column = column_model['keyColumnName']
|
||||
self.has_records = response['recordModel'].get('hasRecords', False)
|
||||
self.rows_response_factory = RowsResponseFactory(column_index)
|
||||
@ -331,18 +443,38 @@ class RefineProject:
|
||||
return
|
||||
|
||||
def apply_operations(self, file_path, wait=True):
|
||||
json = open(file_path).read()
|
||||
response_json = self.do_json('apply-operations', {'operations': json})
|
||||
json_data = open(file_path).read()
|
||||
response_json = self.do_json('apply-operations', {'operations': json_data})
|
||||
if response_json['code'] == 'pending' and wait:
|
||||
self.wait_until_idle()
|
||||
return 'ok'
|
||||
return response_json['code'] # can be 'ok' or 'pending'
|
||||
return response_json['code'] # can be 'ok' or 'pending'
|
||||
|
||||
def export(self, export_format='tsv'):
|
||||
def export(self, encoding=None, export_format='tsv'):
|
||||
"""Return a fileobject of a project's data."""
|
||||
url = ('export-rows/' + urllib.quote(self.project_name()) + '.' +
|
||||
export_format)
|
||||
return self.do_raw(url, data={'format': export_format})
|
||||
url = ('export-rows/' +
|
||||
urllib.quote(self.project_name().encode('utf8')) +
|
||||
'.' + export_format)
|
||||
data = {'format': export_format}
|
||||
if encoding:
|
||||
data['encoding'] = encoding
|
||||
return self.do_raw(url, data)
|
||||
|
||||
def export_templating(self, encoding=None, engine='', prefix='',
|
||||
template='', rowSeparator='\n', suffix=''):
|
||||
"""Return a fileobject of a project's data in templating mode."""
|
||||
url = ('export-rows/' +
|
||||
urllib.quote(self.project_name().encode('utf8')) +
|
||||
'.' + 'txt')
|
||||
data = {'format': 'template',
|
||||
'template': template,
|
||||
'engine': engine,
|
||||
'prefix': prefix,
|
||||
'suffix': suffix,
|
||||
'separator': rowSeparator}
|
||||
if encoding:
|
||||
data['encoding'] = encoding
|
||||
return self.do_raw(url, data)
|
||||
|
||||
def export_rows(self, **kwargs):
|
||||
"""Return an iterable of parsed rows of a project's data."""
|
||||
@ -426,6 +558,7 @@ class RefineProject:
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
def compute_clusters(self, column, clusterer_type='binning',
|
||||
function=None, params=None):
|
||||
"""Returns a list of clusters of {'value': ..., 'count': ...}."""
|
||||
@ -443,7 +576,7 @@ class RefineProject:
|
||||
def annotate_one_row(self, row, annotation, state=True):
|
||||
if annotation not in ('starred', 'flagged'):
|
||||
raise ValueError('annotation must be one of starred or flagged')
|
||||
state = 'true' if state == True else 'false'
|
||||
state = 'true' if state is True else 'false'
|
||||
return self.do_json('annotate-one-row', {'row': row.index,
|
||||
annotation: state})
|
||||
|
||||
@ -457,18 +590,19 @@ class RefineProject:
|
||||
column_insert_index=None, on_error='set-to-blank'):
|
||||
if column_insert_index is None:
|
||||
column_insert_index = self.column_order[column] + 1
|
||||
response = self.do_json('add-column', {'baseColumnName': column,
|
||||
'newColumnName': new_column, 'expression': expression,
|
||||
'columnInsertIndex': column_insert_index, 'onError': on_error})
|
||||
response = self.do_json('add-column', {
|
||||
'baseColumnName': column, 'newColumnName': new_column,
|
||||
'expression': expression, 'columnInsertIndex': column_insert_index,
|
||||
'onError': on_error})
|
||||
self.get_models()
|
||||
return response
|
||||
|
||||
def split_column(self, column, separator=',', mode='separator',
|
||||
regex=False, guess_cell_type=True,
|
||||
remove_original_column=True):
|
||||
response = self.do_json('split-column', {'columnName': column,
|
||||
'separator': separator, 'mode': mode, 'regex': regex,
|
||||
'guessCellType': guess_cell_type,
|
||||
response = self.do_json('split-column', {
|
||||
'columnName': column, 'separator': separator, 'mode': mode,
|
||||
'regex': regex, 'guessCellType': guess_cell_type,
|
||||
'removeOriginalColumn': remove_original_column})
|
||||
self.get_models()
|
||||
return response
|
||||
@ -505,9 +639,11 @@ class RefineProject:
|
||||
self.get_models()
|
||||
return response
|
||||
|
||||
def transpose_columns_into_rows(self, start_column, column_count,
|
||||
combined_column_name, separator=':', prepend_column_name=True,
|
||||
ignore_blank_cells=True):
|
||||
def transpose_columns_into_rows(
|
||||
self, start_column, column_count,
|
||||
combined_column_name, separator=':', prepend_column_name=True,
|
||||
ignore_blank_cells=True):
|
||||
|
||||
response = self.do_json('transpose-columns-into-rows', {
|
||||
'startColumnName': start_column, 'columnCount': column_count,
|
||||
'combinedColumnName': combined_column_name,
|
||||
@ -550,7 +686,8 @@ class RefineProject:
|
||||
return recon_service
|
||||
return None
|
||||
|
||||
def reconcile(self, column, service, type=None, config=None):
|
||||
def reconcile(self, column, service, reconciliation_type=None,
|
||||
reconciliation_config=None):
|
||||
"""Perform a reconciliation asynchronously.
|
||||
|
||||
config: {
|
||||
@ -570,21 +707,21 @@ class RefineProject:
|
||||
for reconciliation to complete.
|
||||
"""
|
||||
# Create a reconciliation config by looking up recon service info
|
||||
if config is None:
|
||||
if reconciliation_config is None:
|
||||
service = self.get_reconciliation_service_by_name_or_url(service)
|
||||
if type is None:
|
||||
if reconciliation_type is None:
|
||||
raise ValueError('Must have at least one of config or type')
|
||||
config = {
|
||||
reconciliation_config = {
|
||||
'mode': 'standard-service',
|
||||
'service': service['url'],
|
||||
'identifierSpace': service['identifierSpace'],
|
||||
'schemaSpace': service['schemaSpace'],
|
||||
'type': {
|
||||
'id': type['id'],
|
||||
'name': type['name'],
|
||||
'id': reconciliation_type['id'],
|
||||
'name': reconciliation_type['name'],
|
||||
},
|
||||
'autoMatch': True,
|
||||
'columnDetails': [],
|
||||
}
|
||||
return self.do_json('reconcile', {
|
||||
'columnName': column, 'config': json.dumps(config)})
|
||||
'columnName': column, 'config': json.dumps(reconciliation_config)})
|
||||
|
BIN
openrefine-client-peek.gif
Normal file
BIN
openrefine-client-peek.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.7 MiB |
88
refine.py
88
refine.py
@ -1,13 +1,6 @@
|
||||
#!/usr/bin/env python
|
||||
"""
|
||||
Script to provide a command line interface to a Refine server.
|
||||
|
||||
Examples,
|
||||
|
||||
refine --list # show list of Refine projects, ID: name
|
||||
refine --export 1234... > project.tsv
|
||||
refine --export --output=project.xls 1234...
|
||||
refine --apply trim.json 1234...
|
||||
"""
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
@ -25,79 +18,18 @@ refine --apply trim.json 1234...
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
|
||||
import optparse
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
from google.refine import refine
|
||||
from google.refine import __main__, cli, refine
|
||||
|
||||
|
||||
PARSER = optparse.OptionParser(
|
||||
usage='usage: %prog [--help | OPTIONS] [project ID/URL]')
|
||||
PARSER.add_option('-H', '--host', dest='host',
|
||||
help='Google Refine hostname')
|
||||
PARSER.add_option('-P', '--port', dest='port',
|
||||
help='Google Refine port')
|
||||
PARSER.add_option('-o', '--output', dest='output',
|
||||
help='Output filename')
|
||||
# Options that are more like commands
|
||||
PARSER.add_option('-l', '--list', dest='list', action='store_true',
|
||||
help='List projects')
|
||||
PARSER.add_option('-E', '--export', dest='export', action='store_true',
|
||||
help='Export project')
|
||||
PARSER.add_option('-f', '--apply', dest='apply',
|
||||
help='Apply a JSON commands file to a project')
|
||||
|
||||
def list_projects():
|
||||
"""Query the Refine server and list projects by ID: name."""
|
||||
projects = refine.Refine(refine.RefineServer()).list_projects().items()
|
||||
def date_to_epoch(json_dt):
|
||||
"Convert a JSON date time into seconds-since-epoch."
|
||||
return time.mktime(time.strptime(json_dt, '%Y-%m-%dT%H:%M:%SZ'))
|
||||
projects.sort(key=lambda v: date_to_epoch(v[1]['modified']), reverse=True)
|
||||
for project_id, project_info in projects:
|
||||
print('{0:>14}: {1}'.format(project_id, project_info['name']))
|
||||
|
||||
def export_project(project, options):
|
||||
"""Dump a project to stdout or options.output file."""
|
||||
export_format = 'tsv'
|
||||
if options.output:
|
||||
ext = os.path.splitext(options.output)[1][1:] # 'xls'
|
||||
if ext:
|
||||
export_format = ext.lower()
|
||||
output = open(options.output, 'wb')
|
||||
else:
|
||||
output = sys.stdout
|
||||
output.writelines(project.export(export_format=export_format))
|
||||
output.close()
|
||||
|
||||
def main():
|
||||
"Main."
|
||||
options, args = PARSER.parse_args()
|
||||
|
||||
if options.host:
|
||||
refine.REFINE_HOST = options.host
|
||||
if options.port:
|
||||
refine.REFINE_PORT = options.port
|
||||
|
||||
if not options.list and len(args) != 1:
|
||||
PARSER.print_usage()
|
||||
if options.list:
|
||||
list_projects()
|
||||
if args:
|
||||
project = refine.RefineProject(args[0])
|
||||
if options.apply:
|
||||
response = project.apply_operations(options.apply)
|
||||
if response != 'ok':
|
||||
print >>sys.stderr, 'Failed to apply %s: %s' % (options.apply,
|
||||
response)
|
||||
if options.export:
|
||||
export_project(project, options)
|
||||
|
||||
return project
|
||||
# workaround for pyinstaller
|
||||
if getattr(sys, 'frozen', False) and hasattr(sys, '_MEIPASS'):
|
||||
reload(sys)
|
||||
sys.setdefaultencoding('utf-8')
|
||||
if sys.platform == "win32":
|
||||
import codecs
|
||||
codecs.register(lambda name: codecs.lookup(
|
||||
'utf-8') if name == 'cp65001' else None)
|
||||
|
||||
if __name__ == '__main__':
|
||||
# return project so that it's available interactively, python -i refine.py
|
||||
project = main()
|
||||
__main__.main()
|
||||
|
@ -1 +1 @@
|
||||
https://github.com/seisen/urllib2_file/tarball/master
|
||||
urllib2_file>=0.2.1
|
46
setup.py
46
setup.py
@ -20,28 +20,40 @@ import os
|
||||
from setuptools import setup
|
||||
from setuptools import find_packages
|
||||
|
||||
def read(fname):
|
||||
return open(os.path.join(os.path.dirname(__file__), fname)).read()
|
||||
|
||||
setup(name='refine-client',
|
||||
version='0.2.1',
|
||||
description=('The Google Refine Python Client Library provides an '
|
||||
'interface to communicating with a Google Refine server.'),
|
||||
long_description=read('README.rst'),
|
||||
author='Paul Makepeace',
|
||||
author_email='paulm@paulm.com',
|
||||
url='https://github.com/PaulMakepeace/refine-client-py',
|
||||
def read(filename):
|
||||
return open(os.path.join(os.path.dirname(__file__), filename)).read()
|
||||
|
||||
setup(name='openrefine-client',
|
||||
version='0.3.10',
|
||||
description=('The OpenRefine Python Client Library provides an '
|
||||
'interface to communicating with an OpenRefine server. '
|
||||
'This fork extends the command line interface (CLI).'),
|
||||
long_description=read('README.md'),
|
||||
long_description_content_type='text/markdown',
|
||||
author='Felix Lohmeier',
|
||||
author_email='felix.lohmeier@opencultureconsulting.com',
|
||||
url='https://github.com/opencultureconsulting/openrefine-client',
|
||||
packages=find_packages(exclude=['tests']),
|
||||
install_requires=['urllib2_file'],
|
||||
python_requires='>=2.7, !=3.*',
|
||||
entry_points={
|
||||
'console_scripts': [ 'openrefine-client = google.refine.__main__:main' ]
|
||||
},
|
||||
platforms=['Any'],
|
||||
keywords='openrefine client batch processing docker etl code4lib',
|
||||
classifiers = [
|
||||
'Development Status :: 3 - Alpha',
|
||||
'Intended Audience :: Developers',
|
||||
'License :: OSI Approved :: GNU General Public License (GPL)',
|
||||
'Operating System :: OS Independent',
|
||||
'Programming Language :: Python',
|
||||
'Topic :: Software Development :: Libraries :: Python Modules',
|
||||
'Topic :: Text Processing',
|
||||
'Development Status :: 4 - Beta',
|
||||
'Intended Audience :: Developers',
|
||||
'Intended Audience :: System Administrators',
|
||||
'License :: OSI Approved :: GNU General Public License (GPL)',
|
||||
'License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)',
|
||||
'Operating System :: OS Independent',
|
||||
'Programming Language :: Python',
|
||||
'Programming Language :: Python :: 2',
|
||||
'Programming Language :: Python :: 2.7',
|
||||
'Topic :: Software Development :: Libraries :: Python Modules',
|
||||
'Topic :: Text Processing',
|
||||
],
|
||||
test_suite='tests',
|
||||
)
|
||||
|
123
tests-cli.sh
Executable file
123
tests-cli.sh
Executable file
@ -0,0 +1,123 @@
|
||||
#!/bin/bash
|
||||
# Script for running functional tests against the CLI
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
# ================================== CONFIG ================================== #
|
||||
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
|
||||
port=3334
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
version="${1}"
|
||||
else
|
||||
version="3.2"
|
||||
fi
|
||||
refine="openrefine-${version}/refine"
|
||||
|
||||
if [[ ${2} ]]; then
|
||||
client="$(readlink -e "${2}")"
|
||||
else
|
||||
client="python2 $(readlink -e refine.py)"
|
||||
fi
|
||||
cmd="${client} -H localhost -P ${port}"
|
||||
|
||||
if [[ ${3} ]]; then
|
||||
filename="${3%%.*}"
|
||||
else
|
||||
filename=""
|
||||
fi
|
||||
cmd="${client} -H localhost -P ${port}"
|
||||
|
||||
# =============================== REQUIREMENTS =============================== #
|
||||
|
||||
# check existence of java and cURL
|
||||
if [[ -z "$(command -v java 2> /dev/null)" ]] ; then
|
||||
echo 1>&2 "ERROR: OpenRefine requires JAVA runtime environment (jre)" \
|
||||
"https://openjdk.java.net/install/"
|
||||
exit 1
|
||||
fi
|
||||
if [[ -z "$(command -v curl 2> /dev/null)" ]] ; then
|
||||
echo 1>&2 "ERROR: This shell script requires cURL" \
|
||||
"https://curl.haxx.se/download.html"
|
||||
exit 1
|
||||
fi
|
||||
# download OpenRefine
|
||||
if [[ -z "$(readlink -e "${refine}")" ]]; then
|
||||
echo "Download OpenRefine ${version}..."
|
||||
mkdir -p "$(dirname "${refine}")"
|
||||
curl -L --output openrefine.tar.gz \
|
||||
"https://github.com/OpenRefine/OpenRefine/releases/download/${version}/openrefine-linux-${version}.tar.gz"
|
||||
echo "Install OpenRefine ${version} in subdirectory $(dirname "${refine}")..."
|
||||
tar -xzf openrefine.tar.gz -C "$(dirname "${refine}")" --strip 1 --totals
|
||||
rm -f openrefine.tar.gz
|
||||
# do not try to open OpenRefine in browser
|
||||
sed -i '$ a JAVA_OPTIONS=-Drefine.headless=true' \
|
||||
"$(dirname "${refine}")"/refine.ini
|
||||
# set autosave period from 5 minutes to 25 hours
|
||||
sed -i 's/#REFINE_AUTOSAVE_PERIOD=60/REFINE_AUTOSAVE_PERIOD=1500/' \
|
||||
"$(dirname "${refine}")"/refine.ini
|
||||
echo
|
||||
fi
|
||||
|
||||
# ================================== SETUP =================================== #
|
||||
|
||||
dir="$(readlink -f "tests/tmp")"
|
||||
mkdir -p "${dir}"
|
||||
rm -f tests-cli.log
|
||||
|
||||
echo "start OpenRefine ${version}..."
|
||||
${refine} -v warn -p ${port} -d "${dir}" &>> tests-cli.log &
|
||||
pid_server=${!}
|
||||
timeout 30s bash -c "until curl -s 'http://localhost:3334' \
|
||||
| cat | grep -q -o 'OpenRefine' ; do sleep 1; done" \
|
||||
|| error "starting OpenRefine server failed!"
|
||||
echo
|
||||
|
||||
# ================================== TESTS =================================== #
|
||||
|
||||
echo "running tests, please wait..."
|
||||
tests=()
|
||||
results=()
|
||||
for t in tests/*${filename}*.sh; do
|
||||
tests+=("${t}")
|
||||
echo "======================= ${t} =======================" &>> tests-cli.log
|
||||
bash "${t}" "${cmd}" "${version}" &>> tests-cli.log
|
||||
results+=(${?})
|
||||
done
|
||||
echo
|
||||
|
||||
# ================================= TEARDOWN ================================= #
|
||||
|
||||
echo "cleanup..."
|
||||
{ kill -9 "${pid_server}" && wait "${pid_server}"; } 2>/dev/null
|
||||
rm -rf "${dir}"
|
||||
echo
|
||||
|
||||
# ================================= SUMMARY ================================== #
|
||||
|
||||
printf "%s\t%s\n" "code" "test"
|
||||
printf "%s\t%s\n" "----" "----------------"
|
||||
for i in "${!tests[@]}"; do
|
||||
printf "%s\t%s\n" "${results[$i]}" "${tests[$i]}"
|
||||
done
|
||||
echo
|
||||
if [[ " ${results[*]} " =~ [1-9] ]]; then
|
||||
echo "failed tests! check tests-cli.log for debugging"; echo
|
||||
else
|
||||
echo "all tests passed!"; echo
|
||||
fi
|
130
tests.sh
Executable file
130
tests.sh
Executable file
@ -0,0 +1,130 @@
|
||||
#!/bin/bash
|
||||
# Script for running tests with different OpenRefine and Java versions based on Docker images.
|
||||
|
||||
# Copyright (c) 2011 Paul Makepeace, Real Programmers. All rights reserved.
|
||||
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. If not, see <http://www.gnu.org/licenses/>
|
||||
|
||||
# defaults:
|
||||
all=(3.5.0 3.4.1 3.4 3.3 3.2-java12 3.2-java11 3.2-java10 3.2-java9 3.2 3.1-java9 3.1 3.0-java9 3.0 2.8-java9 2.8 2.8-java7 2.7 2.7-java7 2.5-java7 2.5-java6 2.1-java6 2.0-java6)
|
||||
main=(3.5.0 3.4.1 3.4 3.3 3.2 3.1 3.0 2.8 2.7 2.5-java6 2.1-java6 2.0-java6)
|
||||
interactively=false
|
||||
port="3333"
|
||||
|
||||
# help screen
|
||||
function usage () {
|
||||
cat <<EOF
|
||||
Usage: ./tests.sh [-t TAG] [-i] [-p] [-a] [-h]
|
||||
|
||||
Script for running tests with different OpenRefine and Java versions.
|
||||
It uses docker images from https://hub.docker.com/r/felixlohmeier/openrefine.
|
||||
|
||||
Examples:
|
||||
./tests.sh -a # run tests on all OpenRefine versions (from 2.0 up to 3.5.0)
|
||||
./tests.sh -t 3.5.0 # run tests on tag 3.5.0
|
||||
./tests.sh -t 3.5.0 -i # run tests on tag 3.5.0 interactively (pause before and after tests)
|
||||
./tests.sh -t 3.5.0 -t 2.7 # run tests on tags 3.5.0 and 2.7
|
||||
|
||||
Advanced:
|
||||
./tests.sh -j # run tests on all OpenRefine versions and each with all supported Java versions (requires a lot of docker images to be downloaded!)
|
||||
./tests.sh -t 3.1 -i -p 3334 # run tests on tag 3.1 interactively on port 3334
|
||||
|
||||
Running tests interactively (-i) allows you to examine OpenRefine GUI at http://localhost:3333.
|
||||
Execute the script concurrently in another terminal on another port (-p 3334) to compare changes in the OpenRefine GUI at http://localhost:3333 and http://localhost:3334.
|
||||
|
||||
Available tags (java 8 if java not mentioned in tag):
|
||||
EOF
|
||||
for t in ${all[*]} ; do
|
||||
echo "$t"
|
||||
done
|
||||
exit 1
|
||||
}
|
||||
|
||||
# check input
|
||||
NUMARGS=$#
|
||||
if [ "$NUMARGS" -eq 0 ]; then
|
||||
usage
|
||||
fi
|
||||
|
||||
# check system requirements
|
||||
DOCKER="$(command -v docker 2> /dev/null)"
|
||||
if [ -z "$DOCKER" ] ; then
|
||||
echo 1>&2 "This action requires you to have 'docker' installed and present in your PATH. You can download it for free at http://www.docker.com/"
|
||||
exit 1
|
||||
fi
|
||||
DOCKERINFO="$(docker info 2>/dev/null | grep 'Server Version')"
|
||||
if [ -z "$DOCKERINFO" ]
|
||||
then
|
||||
echo "command 'docker info' failed, trying again with sudo..."
|
||||
DOCKERINFO="$(sudo docker info 2>/dev/null | grep 'Server Version')"
|
||||
echo "OK"
|
||||
docker=(sudo docker)
|
||||
if [ -z "$DOCKERINFO" ] ; then
|
||||
echo 1>&2 "This action requires you to start the docker daemon. Try 'sudo systemctl start docker' or 'sudo start docker'. If the docker daemon is already running then maybe some security privileges are missing to run docker commands.'"
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
docker=(docker)
|
||||
fi
|
||||
CURLINFO="$(command -v curl 2>/dev/null)"
|
||||
if [ -z "$CURLINFO" ] ; then
|
||||
echo 1>&2 "This action requires you to have 'curl' installed and present in your PATH."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# get user input
|
||||
options="t:p:iajh"
|
||||
while getopts $options opt; do
|
||||
case $opt in
|
||||
t ) tags+=("${OPTARG}");;
|
||||
p ) port="${OPTARG}";export OPENREFINE_PORT="$port";;
|
||||
i ) interactively=true;;
|
||||
a ) tags=("${main[*]}");;
|
||||
j ) tags=("${all[*]}");;
|
||||
h ) usage ;;
|
||||
\? ) echo 1>&2 "Unknown option: -$OPTARG"; usage; exit 1;;
|
||||
: ) echo 1>&2 "Missing option argument for -$OPTARG"; usage; exit 1;;
|
||||
* ) echo 1>&2 "Unimplemented option: -$OPTARG"; usage; exit 1;;
|
||||
esac
|
||||
done
|
||||
shift $((OPTIND - 1))
|
||||
|
||||
# print config
|
||||
echo "Tags: ${tags[*]}"
|
||||
echo "Port: $port"
|
||||
echo ""
|
||||
|
||||
# safe cleanup handler
|
||||
cleanup()
|
||||
{
|
||||
echo "cleanup..."
|
||||
${docker[*]} stop "$t"
|
||||
}
|
||||
trap "cleanup;exit" SIGHUP SIGINT SIGQUIT SIGTERM
|
||||
|
||||
# run setup.py tests for each docker tag
|
||||
for t in ${tags[*]} ; do
|
||||
echo "=== Tests for $t ==="
|
||||
echo ""
|
||||
echo "Begin: $(date)"
|
||||
${docker[*]} run -d -p "$port":3333 --rm --name "$t" felixlohmeier/openrefine:"$t"
|
||||
until curl --silent -N http://localhost:"$port" | cat | grep -q -o "Refine" ; do sleep 1; done
|
||||
echo "Refine running at http://localhost:${port}"
|
||||
if [ $interactively = true ]; then read -r -p "Press [Enter] key to start tests..."; fi
|
||||
python2 setup.py test
|
||||
if [ $interactively = true ]; then read -r -p "Press [Enter] key to stop OpenRefine..."; fi
|
||||
${docker[*]} stop "$t"
|
||||
echo "End: $(date)"
|
||||
echo ""
|
||||
done
|
57
tests/apply-utf8.sh
Normal file
57
tests/apply-utf8.sh
Normal file
@ -0,0 +1,57 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-addition",
|
||||
"engineConfig": {
|
||||
"mode": "row-based"
|
||||
},
|
||||
"newColumnName": "apply",
|
||||
"columnInsertIndex": 2,
|
||||
"baseColumnName": "b",
|
||||
"expression": "grel:value.replace('2','⛲')",
|
||||
"onError": "set-to-blank"
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b apply c
|
||||
1 2 ⛲ 3
|
||||
0 0 0 0
|
||||
$ \ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
57
tests/apply.sh
Normal file
57
tests/apply.sh
Normal file
@ -0,0 +1,57 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-addition",
|
||||
"engineConfig": {
|
||||
"mode": "row-based"
|
||||
},
|
||||
"newColumnName": "apply",
|
||||
"columnInsertIndex": 2,
|
||||
"baseColumnName": "b",
|
||||
"expression": "grel:value.replace('2','TEST')",
|
||||
"onError": "set-to-blank"
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b apply c
|
||||
1 2 TEST 3
|
||||
0 0 0 0
|
||||
$ \ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
41
tests/create-csv-encoding.sh
Normal file
41
tests/create-csv-encoding.sh
Normal file
@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}-utf8.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
ä,é,ß
|
||||
$,\,'
|
||||
DATA
|
||||
iconv -f UTF-8 -t ISO-8859-1 "tmp/${t}/${t}-utf8.csv" > "tmp/${t}/${t}.csv"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
ä é ß
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --encoding "ISO-8859-1"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/create-csv-guessCellValueTypes.sh
Normal file
40
tests/create-csv-guessCellValueTypes.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
01,02,03
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
1 2 3
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --guessCellValueTypes "true"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
41
tests/create-csv-headerLines.sh
Normal file
41
tests/create-csv-headerLines.sh
Normal file
@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Column 1 Column 2 Column 3
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --headerLines "0"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/create-csv-ignoreLines.sh
Normal file
39
tests/create-csv-ignoreLines.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --ignoreLines "1"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/create-csv-limit.sh
Normal file
39
tests/create-csv-limit.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --limit "2"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
41
tests/create-csv-processQuotes.sh
Normal file
41
tests/create-csv-processQuotes.sh
Normal file
@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,"2,0",3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c Column 4
|
||||
1 2 0 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
# OpenRefine 4.x fails without manually set headerLines
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --processQuotes "false" --headerLines 1
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
45
tests/create-csv-projectTags.sh
Normal file
45
tests/create-csv-projectTags.sh
Normal file
@ -0,0 +1,45 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version:0:1}" = "2" ]]; then
|
||||
echo "projectTags were introduced in OpenRefine 3.0"
|
||||
exit 200
|
||||
else
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
tags: [u'beta', u'client1']
|
||||
DATA
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --projectTags "beta" --projectTags "client1"
|
||||
${cmd} --info "${t}" | grep ' tags: ' > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/create-csv-separator.sh
Normal file
40
tests/create-csv-separator.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a;b;c
|
||||
1;2;3
|
||||
0;0;0
|
||||
$;\;'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --separator ";"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/create-csv-skipDataLines.sh
Normal file
39
tests/create-csv-skipDataLines.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --skipDataLines "1"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
58
tests/create-csv-storeBlankCellsAsNulls.sh
Normal file
58
tests/create-csv-storeBlankCellsAsNulls.sh
Normal file
@ -0,0 +1,58 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "b",
|
||||
"expression": "grel:isNull(value)",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 false 3
|
||||
0 false 0
|
||||
$ false '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --storeBlankCellsAsNulls "false"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/create-csv-storeBlankRows.sh
Normal file
39
tests/create-csv-storeBlankRows.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
,,
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --storeBlankRows "false"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/create-csv-utf8.sh
Normal file
40
tests/create-csv-utf8.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨ code meaning
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --projectName "${t} biểu tượng cảm xúc ⛲"
|
||||
${cmd} --export "${t} biểu tượng cảm xúc ⛲" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/create-csv.sh
Normal file
40
tests/create-csv.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
55
tests/create-json-recordPath.sh
Normal file
55
tests/create-json-recordPath.sh
Normal file
@ -0,0 +1,55 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.json"
|
||||
{
|
||||
"rows":[
|
||||
{
|
||||
"a":1,
|
||||
"b":2,
|
||||
"c":3
|
||||
},
|
||||
{
|
||||
"a":0,
|
||||
"b":0,
|
||||
"c":0
|
||||
},
|
||||
{
|
||||
"a":"$",
|
||||
"b":"\\",
|
||||
"c":"\""
|
||||
}
|
||||
]
|
||||
}
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
_ - a _ - b _ - c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ """"
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.json" --recordPath "_" --recordPath "rows" --recordPath "_"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
52
tests/create-json-storeEmptyStrings.sh
Normal file
52
tests/create-json-storeEmptyStrings.sh
Normal file
@ -0,0 +1,52 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.json"
|
||||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": "",
|
||||
"b": "",
|
||||
"c": ""
|
||||
},
|
||||
{
|
||||
"a": "$",
|
||||
"b": "\\",
|
||||
"c": "\""
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
_ - a _ - b _ - c
|
||||
1 2 3
|
||||
$ \ """"
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.json" --storeEmptyStrings "false"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
62
tests/create-json-trimStrings.sh
Normal file
62
tests/create-json-trimStrings.sh
Normal file
@ -0,0 +1,62 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.json"
|
||||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": "0",
|
||||
"b": " 0",
|
||||
"c": "0 "
|
||||
},
|
||||
{
|
||||
"a": "$",
|
||||
"b": "\\",
|
||||
"c": "\""
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version:0:1}" = "2" || "${version}" = "3.0" || "${version}" = "3.1" || "${version}" = "3.2" || "${version}" = "3.3" ]]; then
|
||||
echo "trimStrings option does not work in OpenRefine <=3.3"
|
||||
echo "https://github.com/OpenRefine/OpenRefine/issues/2409"
|
||||
exit 200
|
||||
else
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
_ - a _ - b _ - c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ """"
|
||||
DATA
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.json" --trimStrings "true"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
53
tests/create-json-utf8.sh
Normal file
53
tests/create-json-utf8.sh
Normal file
@ -0,0 +1,53 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.json"
|
||||
[
|
||||
{
|
||||
"⌨": "⛲",
|
||||
"code": "1F347",
|
||||
"meaning": "FOUNTAIN"
|
||||
},
|
||||
{
|
||||
"⌨": "⛳",
|
||||
"code": "1F349",
|
||||
"meaning": "FLAG IN HOLE"
|
||||
},
|
||||
{
|
||||
"⌨": "⛵",
|
||||
"code": "1F352",
|
||||
"meaning": "SAILBOAT"
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
_ - ⌨ _ - code _ - meaning
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.json"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
53
tests/create-json.sh
Normal file
53
tests/create-json.sh
Normal file
@ -0,0 +1,53 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.json"
|
||||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 0,
|
||||
"b": 0,
|
||||
"c": 0
|
||||
},
|
||||
{
|
||||
"a": "$",
|
||||
"b": "\\",
|
||||
"c": "\""
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
_ - a _ - b _ - c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ """"
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.json"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
44
tests/create-ods-sheets-utf8.sh
Normal file
44
tests/create-ods-sheets-utf8.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cp "data/example.ods" "tmp/${t}/${t}.ods"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version}" = "2.7" ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨ code meaning Column Column 5 Column 6 Column 7 Column 8
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
else
|
||||
#TODO
|
||||
echo "https://github.com/opencultureconsulting/openrefine-client/issues/4"
|
||||
exit 200
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.ods" --sheets 1
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
48
tests/create-ods.sh
Normal file
48
tests/create-ods.sh
Normal file
@ -0,0 +1,48 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cp "data/example.ods" "tmp/${t}/${t}.ods"
|
||||
#a b c
|
||||
#1 2 3
|
||||
#0 0 0
|
||||
#$ \ '
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version}" = "2.7" ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c Column Column 5 Column 6 Column 7 Column 8
|
||||
1.0 2.0 3.0
|
||||
0.0 0.0 0.0
|
||||
$ \ '
|
||||
DATA
|
||||
else
|
||||
#TODO
|
||||
echo "https://github.com/opencultureconsulting/openrefine-client/issues/4"
|
||||
exit 200
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.ods"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/create-tsv-utf8.sh
Normal file
40
tests/create-tsv-utf8.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.tsv"
|
||||
⌨ code meaning
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.tsv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.csv"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.csv"
|
40
tests/create-tsv.sh
Normal file
40
tests/create-tsv.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.tsv"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.tsv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.csv"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.csv"
|
80
tests/create-txt-fixed-width-headerLines.sh
Normal file
80
tests/create-txt-fixed-width-headerLines.sh
Normal file
@ -0,0 +1,80 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.txt"
|
||||
1 2 3
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "1",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
},
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "2",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
},
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "3",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
1 2 3
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.txt" --columnWidths "6" --columnWidths "6" --columnWidths "6" --headerLines "1"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
81
tests/create-txt-fixed-width-utf8.sh
Normal file
81
tests/create-txt-fixed-width-utf8.sh
Normal file
@ -0,0 +1,81 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.txt"
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "Column 1",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
},
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "Column 2",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
},
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "Column 3",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Column 1 Column 2 Column 3
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.txt" --columnWidths "6" --columnWidths "6" --columnWidths "60"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
81
tests/create-txt-fixed-width.sh
Normal file
81
tests/create-txt-fixed-width.sh
Normal file
@ -0,0 +1,81 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.txt"
|
||||
1 2 3
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "Column 1",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
},
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "Column 2",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
},
|
||||
{
|
||||
"op": "core/text-transform",
|
||||
"engineConfig": {
|
||||
"facets": [],
|
||||
"mode": "row-based"
|
||||
},
|
||||
"columnName": "Column 3",
|
||||
"expression": "grel:value.trim()",
|
||||
"onError": "keep-original",
|
||||
"repeat": false,
|
||||
"repeatCount": 10
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Column 1 Column 2 Column 3
|
||||
1 2 3
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.txt" --columnWidths "6" --columnWidths "6" --columnWidths "6"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/create-txt-linesPerRow.sh
Normal file
39
tests/create-txt-linesPerRow.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.txt"
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
thu fri sat
|
||||
$70 $20 $50
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Column 1 Column 2
|
||||
mon tue wed $2 $300 $1
|
||||
thu fri sat $70 $20 $50
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.txt" --linesPerRow "2"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/create-txt.sh
Normal file
39
tests/create-txt.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.txt"
|
||||
1 2 3
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Column 1
|
||||
1 2 3
|
||||
mon tue wed
|
||||
$2 $300 $1
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.txt"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
44
tests/create-xls-sheets-utf8.sh
Normal file
44
tests/create-xls-sheets-utf8.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cp "data/example.xls" "tmp/${t}/${t}.xls"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version}" = "2.7" ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨ code meaning
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
else
|
||||
#TODO
|
||||
echo "https://github.com/opencultureconsulting/openrefine-client/issues/4"
|
||||
exit 200
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xls" --sheets 1
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
48
tests/create-xls.sh
Normal file
48
tests/create-xls.sh
Normal file
@ -0,0 +1,48 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cp "data/example.xls" "tmp/${t}/${t}.xls"
|
||||
#a b c
|
||||
#1 2 3
|
||||
#0 0 0
|
||||
#$ \ '
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version}" = "2.7" ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1.0 2.0 3.0
|
||||
0.0 0.0 0.0
|
||||
$ \ '
|
||||
DATA
|
||||
else
|
||||
#TODO
|
||||
echo "https://github.com/opencultureconsulting/openrefine-client/issues/4"
|
||||
exit 200
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xls"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
44
tests/create-xlsx-sheets-utf8.sh
Normal file
44
tests/create-xlsx-sheets-utf8.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cp "data/example.xlsx" "tmp/${t}/${t}.xlsx"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version}" = "2.7" ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨ code meaning
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
else
|
||||
#TODO
|
||||
echo "https://github.com/opencultureconsulting/openrefine-client/issues/4"
|
||||
exit 200
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xlsx" --sheets 1
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
48
tests/create-xlsx.sh
Normal file
48
tests/create-xlsx.sh
Normal file
@ -0,0 +1,48 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
version="${2}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cp "data/example.xlsx" "tmp/${t}/${t}.xlsx"
|
||||
#a b c
|
||||
#1 2 3
|
||||
#0 0 0
|
||||
#$ \ '
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "${version}" = "2.7" ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1.0 2.0 3.0
|
||||
0.0 0.0 0.0
|
||||
$ \ '
|
||||
DATA
|
||||
else
|
||||
#TODO
|
||||
echo "https://github.com/opencultureconsulting/openrefine-client/issues/4"
|
||||
exit 200
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xlsx"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
96
tests/create-xml-recordPath.sh
Normal file
96
tests/create-xml-recordPath.sh
Normal file
@ -0,0 +1,96 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.xml"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<root>
|
||||
<record>
|
||||
<a>1</a>
|
||||
<b>2</b>
|
||||
<c>3</c>
|
||||
</record>
|
||||
<record>
|
||||
<a>0</a>
|
||||
<b>0</b>
|
||||
<c>0</c>
|
||||
</record>
|
||||
<record>
|
||||
<a>$</a>
|
||||
<b>\</b>
|
||||
<c>'</c>
|
||||
</record>
|
||||
</root>
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-reorder",
|
||||
"columnNames": [
|
||||
"record - a",
|
||||
"record - b",
|
||||
"record - c"
|
||||
],
|
||||
"description": "Reorder columns"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
"engineConfig": {
|
||||
"facets": [
|
||||
{
|
||||
"type": "list",
|
||||
"name": "Blank Rows",
|
||||
"expression": "(filter(row.columnNames,cn,isNonBlank(cells[cn].value)).length()==0).toString()",
|
||||
"columnName": "",
|
||||
"invert": false,
|
||||
"omitBlank": false,
|
||||
"omitError": false,
|
||||
"selection": [
|
||||
{
|
||||
"v": {
|
||||
"v": "true",
|
||||
"l": "true"
|
||||
}
|
||||
}
|
||||
],
|
||||
"selectBlank": false,
|
||||
"selectError": false
|
||||
}
|
||||
],
|
||||
"mode": "record-based"
|
||||
}
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
record - a record - b record - c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xml" --recordPath "root" --recordPath "record"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
96
tests/create-xml-utf8.sh
Normal file
96
tests/create-xml-utf8.sh
Normal file
@ -0,0 +1,96 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.xml"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<root>
|
||||
<record>
|
||||
<icon>⛲</icon>
|
||||
<code>1F347</code>
|
||||
<meaning>FOUNTAIN</meaning>
|
||||
</record>
|
||||
<record>
|
||||
<icon>⛳</icon>
|
||||
<code>1F349</code>
|
||||
<meaning>FLAG IN HOLE</meaning>
|
||||
</record>
|
||||
<record>
|
||||
<icon>⛵</icon>
|
||||
<code>1F352</code>
|
||||
<meaning>SAILBOAT</meaning>
|
||||
</record>
|
||||
</root>
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-reorder",
|
||||
"columnNames": [
|
||||
"root - record - icon",
|
||||
"root - record - code",
|
||||
"root - record - meaning"
|
||||
],
|
||||
"description": "Reorder columns"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
"engineConfig": {
|
||||
"facets": [
|
||||
{
|
||||
"type": "list",
|
||||
"name": "Blank Rows",
|
||||
"expression": "(filter(row.columnNames,cn,isNonBlank(cells[cn].value)).length()==0).toString()",
|
||||
"columnName": "",
|
||||
"invert": false,
|
||||
"omitBlank": false,
|
||||
"omitError": false,
|
||||
"selection": [
|
||||
{
|
||||
"v": {
|
||||
"v": "true",
|
||||
"l": "true"
|
||||
}
|
||||
}
|
||||
],
|
||||
"selectBlank": false,
|
||||
"selectError": false
|
||||
}
|
||||
],
|
||||
"mode": "record-based"
|
||||
}
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
root - record - icon root - record - code root - record - meaning
|
||||
⛲ 1F347 FOUNTAIN
|
||||
⛳ 1F349 FLAG IN HOLE
|
||||
⛵ 1F352 SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xml"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
96
tests/create-xml.sh
Normal file
96
tests/create-xml.sh
Normal file
@ -0,0 +1,96 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.xml"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<root>
|
||||
<record>
|
||||
<a>1</a>
|
||||
<b>2</b>
|
||||
<c>3</c>
|
||||
</record>
|
||||
<record>
|
||||
<a>0</a>
|
||||
<b>0</b>
|
||||
<c>0</c>
|
||||
</record>
|
||||
<record>
|
||||
<a>$</a>
|
||||
<b>\</b>
|
||||
<c>'</c>
|
||||
</record>
|
||||
</root>
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.transform"
|
||||
[
|
||||
{
|
||||
"op": "core/column-reorder",
|
||||
"columnNames": [
|
||||
"root - record - a",
|
||||
"root - record - b",
|
||||
"root - record - c"
|
||||
],
|
||||
"description": "Reorder columns"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
"engineConfig": {
|
||||
"facets": [
|
||||
{
|
||||
"type": "list",
|
||||
"name": "Blank Rows",
|
||||
"expression": "(filter(row.columnNames,cn,isNonBlank(cells[cn].value)).length()==0).toString()",
|
||||
"columnName": "",
|
||||
"invert": false,
|
||||
"omitBlank": false,
|
||||
"omitError": false,
|
||||
"selection": [
|
||||
{
|
||||
"v": {
|
||||
"v": "true",
|
||||
"l": "true"
|
||||
}
|
||||
}
|
||||
],
|
||||
"selectBlank": false,
|
||||
"selectError": false
|
||||
}
|
||||
],
|
||||
"mode": "record-based"
|
||||
}
|
||||
}
|
||||
]
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
root - record - a root - record - b root - record - c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.xml"
|
||||
${cmd} --apply "tmp/${t}/${t}.transform" "${t}"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
44
tests/create-zip-includeFileSources.sh
Normal file
44
tests/create-zip-includeFileSources.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}-1.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}-2.csv"
|
||||
a,b,c
|
||||
4,5,6
|
||||
DATA
|
||||
|
||||
zip "tmp/${t}/${t}.zip" "tmp/${t}/${t}-1.csv" "tmp/${t}/${t}-2.csv"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
File a b c
|
||||
tmp/${t}/${t}-1.csv 1 2 3
|
||||
tmp/${t}/${t}-2.csv 4 5 6
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.zip" --includeFileSources "true"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
44
tests/create-zip.sh
Normal file
44
tests/create-zip.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}-1.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}-2.csv"
|
||||
a,b,c
|
||||
4,5,6
|
||||
DATA
|
||||
|
||||
zip "tmp/${t}/${t}.zip" "tmp/${t}/${t}-1.csv" "tmp/${t}/${t}-2.csv"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
4 5 6
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.zip"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
11
tests/data/cli/duplicates.csv
Normal file
11
tests/data/cli/duplicates.csv
Normal file
@ -0,0 +1,11 @@
|
||||
email,name,state,gender,purchase,count,date
|
||||
danny.baron@example1.com,Danny Baron,CA,M,TV (UTF-8: 📺),1,"Wed, 4 Jul 2001"
|
||||
melanie.white@example2.edu,Melanie White,NC,F,<iPhone>,1,2001-07-04T12:08:56
|
||||
danny.baron@example1.com, D. ("Tab") Baron,CA,M,Winter jacket,1,2001-07-04
|
||||
ben.tyler@example3.org,Ben Tyler,NV,M,Flashlight,1,2001/07/04
|
||||
arthur.duff@example4.com,Arthur Duff,OR,M,Dining table,1,2001-07
|
||||
danny.baron@example1.com,Daniel Baron,,,Bike,1,2001
|
||||
jean.griffith@example5.org,Jean Griffith,WA,F,Power drill,1,2000
|
||||
melanie.white@example2.edu,Melanie White,NC,F,'iPad',1,1999
|
||||
ben.morisson@example6.org,Ben Morisson,FL,M,Amplifier,1,1998
|
||||
arthur.duff@example4.com,Arthur Duff,OR,M,Night table,1,1997
|
Can't render this file because it contains an unexpected character in line 4 and column 33.
|
92
tests/data/cli/duplicates.json
Normal file
92
tests/data/cli/duplicates.json
Normal file
@ -0,0 +1,92 @@
|
||||
[
|
||||
{
|
||||
"email": "danny.baron@example1.com",
|
||||
"name": "Danny Baron",
|
||||
"state": "CA",
|
||||
"gender": "M",
|
||||
"purchase": "TV (UTF-8: 📺)",
|
||||
"count": 1,
|
||||
"date": "Wed, 4 Jul 2001"
|
||||
},
|
||||
{
|
||||
"email": "melanie.white@example2.edu",
|
||||
"name": "Melanie White",
|
||||
"state": "NC",
|
||||
"gender": "F",
|
||||
"purchase": "<iPhone>",
|
||||
"count": 1,
|
||||
"date": "2001-07-04T12:08:56"
|
||||
},
|
||||
{
|
||||
"email": "danny.baron@example1.com",
|
||||
"name": " D.\t(\"Tab\") Baron",
|
||||
"state": "CA",
|
||||
"gender": "M",
|
||||
"purchase": "Winter jacket",
|
||||
"count": 1,
|
||||
"date": "2001-07-04"
|
||||
},
|
||||
{
|
||||
"email": "ben.tyler@example3.org",
|
||||
"name": "Ben Tyler",
|
||||
"state": "NV",
|
||||
"gender": "M",
|
||||
"purchase": "Flashlight",
|
||||
"count": 1,
|
||||
"date": "2001/07/04"
|
||||
},
|
||||
{
|
||||
"email": "arthur.duff@example4.com",
|
||||
"name": "Arthur Duff",
|
||||
"state": "OR",
|
||||
"gender": "M",
|
||||
"purchase": "Dining table",
|
||||
"count": 1,
|
||||
"date": "2001-07"
|
||||
},
|
||||
{
|
||||
"email": "danny.baron@example1.com",
|
||||
"name": "Daniel Baron",
|
||||
"state": "",
|
||||
"gender": "",
|
||||
"purchase": "Bike",
|
||||
"count": 1,
|
||||
"date": 2001
|
||||
},
|
||||
{
|
||||
"email": "jean.griffith@example5.org",
|
||||
"name": "Jean Griffith",
|
||||
"state": "WA",
|
||||
"gender": "F",
|
||||
"purchase": "Power drill",
|
||||
"count": 1,
|
||||
"date": 2000
|
||||
},
|
||||
{
|
||||
"email": "melanie.white@example2.edu",
|
||||
"name": "Melanie White",
|
||||
"state": "NC",
|
||||
"gender": "F",
|
||||
"purchase": "'iPad'",
|
||||
"count": 1,
|
||||
"date": 1999
|
||||
},
|
||||
{
|
||||
"email": "ben.morisson@example6.org",
|
||||
"name": "Ben Morisson",
|
||||
"state": "FL",
|
||||
"gender": "M",
|
||||
"purchase": "Amplifier",
|
||||
"count": 1,
|
||||
"date": 1998
|
||||
},
|
||||
{
|
||||
"email": "arthur.duff@example4.com",
|
||||
"name": "Arthur Duff",
|
||||
"state": "OR",
|
||||
"gender": "M",
|
||||
"purchase": "Night table",
|
||||
"count": 1,
|
||||
"date": 1997
|
||||
}
|
||||
]
|
BIN
tests/data/cli/duplicates.ods
Normal file
BIN
tests/data/cli/duplicates.ods
Normal file
Binary file not shown.
11
tests/data/cli/duplicates.tsv
Normal file
11
tests/data/cli/duplicates.tsv
Normal file
@ -0,0 +1,11 @@
|
||||
email name state gender purchase count date
|
||||
danny.baron@example1.com Danny Baron CA M TV (UTF-8: 📺) 1 Wed, 4 Jul 2001
|
||||
melanie.white@example2.edu Melanie White NC F <iPhone> 1 2001-07-04T12:08:56
|
||||
danny.baron@example1.com "D. (""Tab"") Baron" CA M Winter jacket 1 2001-07-04
|
||||
ben.tyler@example3.org Ben Tyler NV M Flashlight 1 2001/07/04
|
||||
arthur.duff@example4.com Arthur Duff OR M Dining table 1 2001-07
|
||||
danny.baron@example1.com Daniel Baron Bike 1 2001
|
||||
jean.griffith@example5.org Jean Griffith WA F Power drill 1 2000
|
||||
melanie.white@example2.edu Melanie White NC F 'iPad' 1 1999
|
||||
ben.morisson@example6.org Ben Morisson FL M Amplifier 1 1998
|
||||
arthur.duff@example4.com Arthur Duff OR M Night table 1 1997
|
|
11
tests/data/cli/duplicates.txt
Normal file
11
tests/data/cli/duplicates.txt
Normal file
@ -0,0 +1,11 @@
|
||||
email name state gender purchase count date
|
||||
danny.baron@example1.com Danny Baron CA M TV (UTF-8: 📺) 1 Wed, 4 Jul 2001
|
||||
melanie.white@example2.edu Melanie White NC F <iPhone> 1 2001-07-04T12:08:5
|
||||
danny.baron@example1.com D. ("Tab") Baron CA M Winter jacket 1 2001-07-04
|
||||
ben.tyler@example3.org Ben Tyler NV M Flashlight 1 2001/07/04
|
||||
arthur.duff@example4.com Arthur Duff OR M Dining table 1 2001-07
|
||||
danny.baron@example1.com Daniel Baron Bike 1 2001
|
||||
jean.griffith@example5.org Jean Griffith WA F Power drill 1 2000
|
||||
melanie.white@example2.edu Melanie White NC F 'iPad' 1 1999
|
||||
ben.morisson@example6.org Ben Morisson FL M Amplifier 1 1998
|
||||
arthur.duff@example4.com Arthur Duff OR M Night table 1 1997
|
BIN
tests/data/cli/duplicates.xls
Normal file
BIN
tests/data/cli/duplicates.xls
Normal file
Binary file not shown.
BIN
tests/data/cli/duplicates.xlsx
Normal file
BIN
tests/data/cli/duplicates.xlsx
Normal file
Binary file not shown.
93
tests/data/cli/duplicates.xml
Normal file
93
tests/data/cli/duplicates.xml
Normal file
@ -0,0 +1,93 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<root>
|
||||
<record>
|
||||
<email>danny.baron@example1.com</email>
|
||||
<name>Danny Baron</name>
|
||||
<state>CA</state>
|
||||
<gender>M</gender>
|
||||
<purchase>TV (UTF-8: 📺)</purchase>
|
||||
<count>1</count>
|
||||
<date>Wed, 4 Jul 2001</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>melanie.white@example2.edu</email>
|
||||
<name>Melanie White</name>
|
||||
<state>NC</state>
|
||||
<gender>F</gender>
|
||||
<purchase><iPhone></purchase>
|
||||
<count>1</count>
|
||||
<date>2001-07-04T12:08:56</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>danny.baron@example1.com</email>
|
||||
<name> D. ("Tab") Baron</name>
|
||||
<state>CA</state>
|
||||
<gender>M</gender>
|
||||
<purchase>Winter jacket</purchase>
|
||||
<count>1</count>
|
||||
<date>2001-07-04</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>ben.tyler@example3.org</email>
|
||||
<name>Ben Tyler</name>
|
||||
<state>NV</state>
|
||||
<gender>M</gender>
|
||||
<purchase>Flashlight</purchase>
|
||||
<count>1</count>
|
||||
<date>2001/07/04</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>arthur.duff@example4.com</email>
|
||||
<name>Arthur Duff</name>
|
||||
<state>OR</state>
|
||||
<gender>M</gender>
|
||||
<purchase>Dining table</purchase>
|
||||
<count>1</count>
|
||||
<date>2001-07</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>danny.baron@example1.com</email>
|
||||
<name>Daniel Baron</name>
|
||||
<state></state>
|
||||
<gender></gender>
|
||||
<purchase>Bike</purchase>
|
||||
<count>1</count>
|
||||
<date>2001</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>jean.griffith@example5.org</email>
|
||||
<name>Jean Griffith</name>
|
||||
<state>WA</state>
|
||||
<gender>F</gender>
|
||||
<purchase>Power drill</purchase>
|
||||
<count>1</count>
|
||||
<date>2000</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>melanie.white@example2.edu</email>
|
||||
<name>Melanie White</name>
|
||||
<state>NC</state>
|
||||
<gender>F</gender>
|
||||
<purchase>'iPad'</purchase>
|
||||
<count>1</count>
|
||||
<date>1999</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>ben.morisson@example6.org</email>
|
||||
<name>Ben Morisson</name>
|
||||
<state>FL</state>
|
||||
<gender>M</gender>
|
||||
<purchase>Amplifier</purchase>
|
||||
<count>1</count>
|
||||
<date>1998</date>
|
||||
</record>
|
||||
<record>
|
||||
<email>arthur.duff@example4.com</email>
|
||||
<name>Arthur Duff</name>
|
||||
<state>OR</state>
|
||||
<gender>M</gender>
|
||||
<purchase>Night table</purchase>
|
||||
<count>1</count>
|
||||
<date>1997</date>
|
||||
</record>
|
||||
</root>
|
BIN
tests/data/cli/duplicates.zip
Normal file
BIN
tests/data/cli/duplicates.zip
Normal file
Binary file not shown.
10
tests/data/cli/duplicates1.xml
Normal file
10
tests/data/cli/duplicates1.xml
Normal file
@ -0,0 +1,10 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<record>
|
||||
<email>danny.baron@example1.com</email>
|
||||
<name>Danny Baron</name>
|
||||
<state>CA</state>
|
||||
<gender>M</gender>
|
||||
<purchase>TV (UTF-8: 📺)</purchase>
|
||||
<count>1</count>
|
||||
<date>Wed, 4 Jul 2001</date>
|
||||
</record>
|
BIN
tests/data/cli/duplicates2.ods
Normal file
BIN
tests/data/cli/duplicates2.ods
Normal file
Binary file not shown.
BIN
tests/data/cli/duplicates2.xls
Normal file
BIN
tests/data/cli/duplicates2.xls
Normal file
Binary file not shown.
BIN
tests/data/cli/duplicates2.xlsx
Normal file
BIN
tests/data/cli/duplicates2.xlsx
Normal file
Binary file not shown.
1315
tests/data/cli/dữ liệu biểu tượng cảm xúc.txt
Normal file
1315
tests/data/cli/dữ liệu biểu tượng cảm xúc.txt
Normal file
File diff suppressed because it is too large
Load Diff
6
tests/data/cli/evil-fruits.tsv
Normal file
6
tests/data/cli/evil-fruits.tsv
Normal file
@ -0,0 +1,6 @@
|
||||
🔣 code meaning
|
||||
🍇 1F347 GRAPES
|
||||
🍉 1F349 WATERMELON
|
||||
🍒 1F352 CHERRIES
|
||||
🍓 1F353 STRAWBERRY
|
||||
🍍 1F34D PINEAPPLE
|
|
69
tests/data/duplicates-deletion.json
Normal file
69
tests/data/duplicates-deletion.json
Normal file
@ -0,0 +1,69 @@
|
||||
[
|
||||
{
|
||||
"op": "core/row-reorder",
|
||||
"description": "Reorder rows",
|
||||
"mode": "record-based",
|
||||
"sorting": {
|
||||
"criteria": [
|
||||
{
|
||||
"errorPosition": 1,
|
||||
"caseSensitive": false,
|
||||
"valueType": "string",
|
||||
"column": "email",
|
||||
"blankPosition": 2,
|
||||
"reverse": false
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "core/column-addition",
|
||||
"description": "Create column count at index 1 based on column email using expression grel:facetCount(value, \"value\", \"email\")",
|
||||
"engineConfig": {
|
||||
"mode": "row-based",
|
||||
"facets": []
|
||||
},
|
||||
"newColumnName": "count",
|
||||
"columnInsertIndex": 1,
|
||||
"baseColumnName": "email",
|
||||
"expression": "grel:facetCount(value, \"value\", \"email\")",
|
||||
"onError": "set-to-blank"
|
||||
},
|
||||
{
|
||||
"op": "core/blank-down",
|
||||
"description": "Blank down cells in column email",
|
||||
"engineConfig": {
|
||||
"mode": "row-based",
|
||||
"facets": []
|
||||
},
|
||||
"columnName": "email"
|
||||
},
|
||||
{
|
||||
"op": "core/row-removal",
|
||||
"description": "Remove rows",
|
||||
"engineConfig": {
|
||||
"mode": "row-based",
|
||||
"facets": [
|
||||
{
|
||||
"omitError": false,
|
||||
"expression": "isBlank(value)",
|
||||
"selectBlank": false,
|
||||
"selection": [
|
||||
{
|
||||
"v": {
|
||||
"v": true,
|
||||
"l": "true"
|
||||
}
|
||||
}
|
||||
],
|
||||
"selectError": false,
|
||||
"invert": false,
|
||||
"name": "email",
|
||||
"omitBlank": false,
|
||||
"type": "list",
|
||||
"columnName": "email"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
BIN
tests/data/example.ods
Normal file
BIN
tests/data/example.ods
Normal file
Binary file not shown.
BIN
tests/data/example.xls
Normal file
BIN
tests/data/example.xls
Normal file
Binary file not shown.
BIN
tests/data/example.xlsx
Normal file
BIN
tests/data/example.xlsx
Normal file
Binary file not shown.
36
tests/delete-utf8.sh
Normal file
36
tests/delete-utf8.sh
Normal file
@ -0,0 +1,36 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --projectName "${t} biểu tượng cảm xúc ⛲"
|
||||
${cmd} --list | grep "${t}" || exit 1
|
||||
${cmd} --delete "${t} biểu tượng cảm xúc ⛲"
|
||||
${cmd} --list | grep "${t}" | cut -d ':' -f 2 > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
36
tests/delete.sh
Normal file
36
tests/delete.sh
Normal file
@ -0,0 +1,36 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --list | grep "${t}" || exit 1
|
||||
${cmd} --delete "${t}"
|
||||
${cmd} --list | grep "${t}" | cut -d ':' -f 2 > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
21
tests/download.sh
Normal file
21
tests/download.sh
Normal file
@ -0,0 +1,21 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --download "https://git.io/fj5ju" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "data/duplicates-deletion.json" "tmp/${t}/${t}.output"
|
44
tests/export-csv-utf8.sh
Normal file
44
tests/export-csv-utf8.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.tsv"
|
||||
🔣 code meaning
|
||||
🍇 1F347 GRAPES
|
||||
🍉 1F349 WATERMELON
|
||||
🍒 1F352 CHERRIES
|
||||
🍓 1F353 STRAWBERRY
|
||||
🍍 1F34D PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
🔣,code,meaning
|
||||
🍇,1F347,GRAPES
|
||||
🍉,1F349,WATERMELON
|
||||
🍒,1F352,CHERRIES
|
||||
🍓,1F353,STRAWBERRY
|
||||
🍍,1F34D,PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.tsv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t} biểu tượng cảm xúc 🍉.csv"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t} biểu tượng cảm xúc 🍉.csv"
|
40
tests/export-csv.sh
Normal file
40
tests/export-csv.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.tsv"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.tsv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.csv"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.csv"
|
72
tests/export-html-utf8.sh
Normal file
72
tests/export-html-utf8.sh
Normal file
@ -0,0 +1,72 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ ${2} ]]; then
|
||||
majorversion="${2%%.*}"
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
if [[ "$majorversion" = 2 ]]; then
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
<html>
|
||||
<head>
|
||||
<title>export-html-utf8</title>
|
||||
<meta charset="utf-8" />
|
||||
</head>
|
||||
<body>
|
||||
<table>
|
||||
<tr><th>⌨</th><th>code</th><th>meaning</th></tr>
|
||||
<tr><td>⛲</td><td>1F347</td><td>FOUNTAIN</td></tr>
|
||||
<tr><td>⛳</td><td>1F349</td><td>FLAG IN HOLE</td></tr>
|
||||
<tr><td>⛵</td><td>1F352</td><td>SAILBOAT</td></tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
DATA
|
||||
else
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
<html>
|
||||
<head>
|
||||
<title>export-html-utf8</title>
|
||||
<meta charset="utf-8" />
|
||||
</head>
|
||||
<body>
|
||||
<table>
|
||||
<tr><th>⌨</th><th>code</th><th>meaning</th></tr>
|
||||
<tr><td>⛲</td><td>1F347</td><td>FOUNTAIN</td></tr>
|
||||
<tr><td>⛳</td><td>1F349</td><td>FLAG IN HOLE</td></tr>
|
||||
<tr><td>⛵</td><td>1F352</td><td>SAILBOAT</td></tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
DATA
|
||||
fi
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.html"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.html"
|
50
tests/export-html.sh
Normal file
50
tests/export-html.sh
Normal file
@ -0,0 +1,50 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
<html>
|
||||
<head>
|
||||
<title>export-html</title>
|
||||
<meta charset="utf-8" />
|
||||
</head>
|
||||
<body>
|
||||
<table>
|
||||
<tr><th>a</th><th>b</th><th>c</th></tr>
|
||||
<tr><td>1</td><td>2</td><td>3</td></tr>
|
||||
<tr><td>0</td><td>0</td><td>0</td></tr>
|
||||
<tr><td>$</td><td>\</td><td>'</td></tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.html"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.html"
|
43
tests/export-ods-utf8.sh
Normal file
43
tests/export-ods-utf8.sh
Normal file
@ -0,0 +1,43 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,"FLAG IN HOLE"
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.ods"
|
||||
(cd tmp/"${t}" &&
|
||||
ssconvert -S "${t}.ods" "${t}.csv" &&
|
||||
mv "${t}.csv.1" "${t}.output")
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
47
tests/export-ods.sh
Normal file
47
tests/export-ods.sh
Normal file
@ -0,0 +1,47 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
if [[ -z "$(command -v ssconvert 2> /dev/null)" ]] ; then
|
||||
echo 1>&2 "ERROR: This test requires ssconvert (gnumeric)"
|
||||
exit 127
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.ods"
|
||||
(cd tmp/"${t}" &&
|
||||
ssconvert -S "${t}.ods" "${t}.csv" &&
|
||||
mv "${t}.csv.1" "${t}.output")
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
44
tests/export-tsv-utf8.sh
Normal file
44
tests/export-tsv-utf8.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
🔣,code,meaning
|
||||
🍇,1F347,GRAPES
|
||||
🍉,1F349,WATERMELON
|
||||
🍒,1F352,CHERRIES
|
||||
🍓,1F353,STRAWBERRY
|
||||
🍍,1F34D,PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
🔣 code meaning
|
||||
🍇 1F347 GRAPES
|
||||
🍉 1F349 WATERMELON
|
||||
🍒 1F352 CHERRIES
|
||||
🍓 1F353 STRAWBERRY
|
||||
🍍 1F34D PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.tsv"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.tsv"
|
40
tests/export-tsv.sh
Normal file
40
tests/export-tsv.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.tsv"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.tsv"
|
44
tests/export-utf8.sh
Normal file
44
tests/export-utf8.sh
Normal file
@ -0,0 +1,44 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
🔣,code,meaning
|
||||
🍇,1F347,GRAPES
|
||||
🍉,1F349,WATERMELON
|
||||
🍒,1F352,CHERRIES
|
||||
🍓,1F353,STRAWBERRY
|
||||
🍍,1F34D,PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
🔣 code meaning
|
||||
🍇 1F347 GRAPES
|
||||
🍉 1F349 WATERMELON
|
||||
🍒 1F352 CHERRIES
|
||||
🍓 1F353 STRAWBERRY
|
||||
🍍 1F34D PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
43
tests/export-xls-utf8.sh
Normal file
43
tests/export-xls-utf8.sh
Normal file
@ -0,0 +1,43 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.xls"
|
||||
(cd tmp/"${t}" &&
|
||||
ssconvert -S "${t}.xls" "${t}.csv" &&
|
||||
mv "${t}.csv" "${t}.output")
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
43
tests/export-xls.sh
Normal file
43
tests/export-xls.sh
Normal file
@ -0,0 +1,43 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.xls"
|
||||
(cd tmp/"${t}" &&
|
||||
ssconvert -S "${t}.xls" "${t}.csv" &&
|
||||
mv "${t}.csv" "${t}.output")
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
45
tests/export-xlsx-utf8.sh
Normal file
45
tests/export-xlsx-utf8.sh
Normal file
@ -0,0 +1,45 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
⌨,code,meaning
|
||||
⛲,1F347,FOUNTAIN
|
||||
⛳,1F349,FLAG IN HOLE
|
||||
⛵,1F352,SAILBOAT
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.xlsx"
|
||||
(cd tmp/"${t}" &&
|
||||
ssconvert -S "${t}.xlsx" "${t}.csv" &&
|
||||
mv "${t}.csv" "${t}.output")
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
||||
|
||||
|
43
tests/export-xlsx.sh
Normal file
43
tests/export-xlsx.sh
Normal file
@ -0,0 +1,43 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.xlsx"
|
||||
(cd tmp/"${t}" &&
|
||||
ssconvert -S "${t}.xlsx" "${t}.csv" &&
|
||||
mv "${t}.csv" "${t}.output")
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/export.sh
Normal file
40
tests/export.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/format-create-separator.sh
Normal file
40
tests/format-create-separator.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.txt"
|
||||
a;b;c
|
||||
1;2;3
|
||||
0;0;0
|
||||
$;\;'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a b c
|
||||
1 2 3
|
||||
0 0 0
|
||||
$ \ '
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.txt" --format "csv" --separator ";"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
41
tests/format-create.sh
Normal file
41
tests/format-create.sh
Normal file
@ -0,0 +1,41 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Column 1
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --format "line-based"
|
||||
${cmd} --export "${t}" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/format-export-output.sh
Normal file
40
tests/format-export-output.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --format "csv" --output "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
40
tests/format-export.sh
Normal file
40
tests/format-export.sh
Normal file
@ -0,0 +1,40 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
a,b,c
|
||||
1,2,3
|
||||
0,0,0
|
||||
$,\,'
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --export "${t}" --format "csv" > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
27
tests/help.sh
Normal file
27
tests/help.sh
Normal file
@ -0,0 +1,27 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.assert"
|
||||
Script to provide a command line interface to an OpenRefine server.
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --help | sed '3q;d' > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
39
tests/info-utf8.sh
Normal file
39
tests/info-utf8.sh
Normal file
@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.tsv"
|
||||
🔣 code meaning
|
||||
🍇 1F347 GRAPES
|
||||
🍉 1F349 WATERMELON
|
||||
🍒 1F352 CHERRIES
|
||||
🍓 1F353 STRAWBERRY
|
||||
🍍 1F34D PINEAPPLE
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
column 001: 🔣
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.tsv"
|
||||
${cmd} --info "${t}" | grep 'column 001' > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
35
tests/info.sh
Normal file
35
tests/info.sh
Normal file
@ -0,0 +1,35 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
column 002: b
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --info "${t}" | grep 'column 002' > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
35
tests/list-utf8.sh
Normal file
35
tests/list-utf8.sh
Normal file
@ -0,0 +1,35 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
${t} biểu tượng cảm xúc ⛲
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv" --projectName "${t} biểu tượng cảm xúc ⛲"
|
||||
${cmd} --list | grep "${t}" | cut -d ':' -f 2 > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
35
tests/list.sh
Normal file
35
tests/list.sh
Normal file
@ -0,0 +1,35 @@
|
||||
#!/bin/bash
|
||||
|
||||
# =============================== ENVIRONMENT ================================ #
|
||||
|
||||
if [[ ${1} ]]; then
|
||||
cmd="${1}"
|
||||
else
|
||||
echo 1>&2 "execute tests-cli.sh to run all tests"; exit 1
|
||||
fi
|
||||
|
||||
t="$(basename "${BASH_SOURCE[0]}" .sh)"
|
||||
cd "${BASH_SOURCE%/*}/" || exit 1
|
||||
mkdir -p "tmp/${t}"
|
||||
|
||||
# =================================== DATA =================================== #
|
||||
|
||||
cat << "DATA" > "tmp/${t}/${t}.csv"
|
||||
a,b,c
|
||||
1,2,3
|
||||
DATA
|
||||
|
||||
# ================================= ASSERTION ================================ #
|
||||
|
||||
cat << DATA > "tmp/${t}/${t}.assert"
|
||||
${t}
|
||||
DATA
|
||||
|
||||
# ================================== ACTION ================================== #
|
||||
|
||||
${cmd} --create "tmp/${t}/${t}.csv"
|
||||
${cmd} --list | grep "${t}" | cut -d ':' -f 2 > "tmp/${t}/${t}.output"
|
||||
|
||||
# =================================== TEST =================================== #
|
||||
|
||||
diff -u "tmp/${t}/${t}.assert" "tmp/${t}/${t}.output"
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user