179 lines
7.9 KiB
Plaintext
179 lines
7.9 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# 3. Library Carpentry: OpenRefine"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"<https://librarycarpentry.org/lc-open-refine/>"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Setup"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Daten von Library Carpentry herunterladen"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {
|
|||
|
"scrolled": true
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"--2019-09-02 21:00:56-- https://github.com/LibraryCarpentry/lc-open-refine/raw/gh-pages/data/doaj-article-sample.csv\n",
|
|||
|
"Resolving github.com (github.com)... 140.82.118.4\n",
|
|||
|
"Connecting to github.com (github.com)|140.82.118.4|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 302 Found\n",
|
|||
|
"Location: https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv [following]\n",
|
|||
|
"--2019-09-02 21:00:57-- https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv\n",
|
|||
|
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.112.133\n",
|
|||
|
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.112.133|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 200 OK\n",
|
|||
|
"Length: 524686 (512K) [text/plain]\n",
|
|||
|
"Saving to: ‘doaj-article-sample.csv.1’\n",
|
|||
|
"\n",
|
|||
|
"doaj-article-sample 100%[===================>] 512,39K 962KB/s in 0,5s \n",
|
|||
|
"\n",
|
|||
|
"2019-09-02 21:00:58 (962 KB/s) - ‘doaj-article-sample.csv.1’ saved [524686/524686]\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"wget https://github.com/LibraryCarpentry/lc-open-refine/raw/gh-pages/data/doaj-article-sample.csv"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### OpenRefine herunterladen und in den Ordner `openrefine` entpacken"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {
|
|||
|
"scrolled": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"--2019-09-02 21:04:31-- https://github.com/OpenRefine/OpenRefine/releases/download/3.2/openrefine-linux-3.2.tar.gz\n",
|
|||
|
"Resolving github.com (github.com)... 140.82.118.4\n",
|
|||
|
"Connecting to github.com (github.com)|140.82.118.4|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 302 Found\n",
|
|||
|
"Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/6220644/7dc2a280-afc0-11e9-9a64-d6b401ada2fa?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190902%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190902T190447Z&X-Amz-Expires=300&X-Amz-Signature=d35772d3dd43c57298ab8e2c7d28abfddeb5545257175e3bdc33a6f85b62caff&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dopenrefine-linux-3.2.tar.gz&response-content-type=application%2Foctet-stream [following]\n",
|
|||
|
"--2019-09-02 21:04:32-- https://github-production-release-asset-2e65be.s3.amazonaws.com/6220644/7dc2a280-afc0-11e9-9a64-d6b401ada2fa?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190902%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190902T190447Z&X-Amz-Expires=300&X-Amz-Signature=d35772d3dd43c57298ab8e2c7d28abfddeb5545257175e3bdc33a6f85b62caff&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dopenrefine-linux-3.2.tar.gz&response-content-type=application%2Foctet-stream\n",
|
|||
|
"Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.238.115\n",
|
|||
|
"Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.238.115|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 200 OK\n",
|
|||
|
"Length: 106046195 (101M) [application/octet-stream]\n",
|
|||
|
"Saving to: ‘openrefine-linux-3.2.tar.gz’\n",
|
|||
|
"\n",
|
|||
|
"openrefine-linux-3. 100%[===================>] 101,13M 3,82MB/s in 32s \n",
|
|||
|
"\n",
|
|||
|
"2019-09-02 21:05:06 (3,13 MB/s) - ‘openrefine-linux-3.2.tar.gz’ saved [106046195/106046195]\n",
|
|||
|
"\n",
|
|||
|
"Total bytes read: 125419520 (120MiB, 137MiB/s)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"wget https://github.com/OpenRefine/OpenRefine/releases/download/3.2/openrefine-linux-3.2.tar.gz\n",
|
|||
|
"mkdir -p openrefine\n",
|
|||
|
"tar -xzf openrefine-linux-3.2.tar.gz -C openrefine --strip 1 --totals\n",
|
|||
|
"rm openrefine-linux-3.2.tar.gz"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### OpenRefine starten (die Adresse http://127.0.0.1:3333 öffnet sich automatisch im Browser)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"scrolled": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"You have 15961M of free memory.\n",
|
|||
|
"Your current configuration is set to use 1400M of memory.\n",
|
|||
|
"OpenRefine can run better when given more memory. Read our FAQ on how to allocate more memory here:\n",
|
|||
|
"https://github.com/OpenRefine/OpenRefine/wiki/FAQ:-Allocate-More-Memory\n",
|
|||
|
"Starting OpenRefine at 'http://127.0.0.1:3333/'\n",
|
|||
|
"\n",
|
|||
|
"21:07:41.809 [ refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)\n",
|
|||
|
"21:07:41.810 [ refine_server] refine.memory size: 1400M JVM Max heap: 1407188992 (1ms)\n",
|
|||
|
"21:07:41.819 [ refine_server] Initializing context: '/' from '/home/felix/notebooks/openrefine/webapp' (9ms)\n",
|
|||
|
"SLF4J: Class path contains multiple SLF4J bindings.\n",
|
|||
|
"SLF4J: Found binding in [jar:file:/home/felix/notebooks/openrefine/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
|
|||
|
"SLF4J: Found binding in [jar:file:/home/felix/notebooks/openrefine/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
|
|||
|
"SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
|
|||
|
"SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
|
|||
|
"21:07:42.268 [ refine] Starting OpenRefine 3.2 [55c921b]... (449ms)\n",
|
|||
|
"21:07:42.268 [ refine] initializing FileProjectManager with dir (0ms)\n",
|
|||
|
"21:07:42.268 [ refine] /home/felix/.local/share/openrefine (0ms)\n",
|
|||
|
"21:07:47.059 [ refine] POST /command/core/load-language (4791ms)\n",
|
|||
|
"21:07:47.083 [ refine] GET /command/core/get-preference (24ms)\n",
|
|||
|
"21:07:47.096 [ refine] POST /command/core/load-language (13ms)\n",
|
|||
|
"21:07:47.104 [ refine] POST /command/core/load-language (8ms)\n",
|
|||
|
"21:07:47.188 [ refine] POST /command/core/get-importing-configuration (84ms)\n",
|
|||
|
"21:07:47.216 [ refine] GET /command/core/get-all-project-tags (28ms)\n",
|
|||
|
"21:07:47.229 [ refine] GET /command/core/get-all-project-metadata (13ms)\n",
|
|||
|
"21:07:47.348 [ refine] GET /command/core/get-languages (119ms)\n",
|
|||
|
"21:07:47.447 [ refine] GET /command/database/saved-connection (99ms)\n",
|
|||
|
"21:07:47.489 [ refine] GET /command/core/get-version (42ms)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"openrefine/refine"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Bash",
|
|||
|
"language": "bash",
|
|||
|
"name": "bash"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": "shell",
|
|||
|
"file_extension": ".sh",
|
|||
|
"mimetype": "text/x-sh",
|
|||
|
"name": "bash"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|