digitales-handwerkszeug/03-lc-openrefine.ipynb

179 lines
7.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Library Carpentry: OpenRefine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<https://librarycarpentry.org/lc-open-refine/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Daten von Library Carpentry herunterladen"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-09-02 21:00:56-- https://github.com/LibraryCarpentry/lc-open-refine/raw/gh-pages/data/doaj-article-sample.csv\n",
"Resolving github.com (github.com)... 140.82.118.4\n",
"Connecting to github.com (github.com)|140.82.118.4|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv [following]\n",
"--2019-09-02 21:00:57-- https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.112.133\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.112.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 524686 (512K) [text/plain]\n",
"Saving to: doaj-article-sample.csv.1\n",
"\n",
"doaj-article-sample 100%[===================>] 512,39K 962KB/s in 0,5s \n",
"\n",
"2019-09-02 21:00:58 (962 KB/s) - doaj-article-sample.csv.1 saved [524686/524686]\n",
"\n"
]
}
],
"source": [
"wget https://github.com/LibraryCarpentry/lc-open-refine/raw/gh-pages/data/doaj-article-sample.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OpenRefine herunterladen und in den Ordner `openrefine` entpacken"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-09-02 21:04:31-- https://github.com/OpenRefine/OpenRefine/releases/download/3.2/openrefine-linux-3.2.tar.gz\n",
"Resolving github.com (github.com)... 140.82.118.4\n",
"Connecting to github.com (github.com)|140.82.118.4|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/6220644/7dc2a280-afc0-11e9-9a64-d6b401ada2fa?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190902%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190902T190447Z&X-Amz-Expires=300&X-Amz-Signature=d35772d3dd43c57298ab8e2c7d28abfddeb5545257175e3bdc33a6f85b62caff&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dopenrefine-linux-3.2.tar.gz&response-content-type=application%2Foctet-stream [following]\n",
"--2019-09-02 21:04:32-- https://github-production-release-asset-2e65be.s3.amazonaws.com/6220644/7dc2a280-afc0-11e9-9a64-d6b401ada2fa?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190902%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190902T190447Z&X-Amz-Expires=300&X-Amz-Signature=d35772d3dd43c57298ab8e2c7d28abfddeb5545257175e3bdc33a6f85b62caff&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dopenrefine-linux-3.2.tar.gz&response-content-type=application%2Foctet-stream\n",
"Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.238.115\n",
"Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.238.115|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 106046195 (101M) [application/octet-stream]\n",
"Saving to: openrefine-linux-3.2.tar.gz\n",
"\n",
"openrefine-linux-3. 100%[===================>] 101,13M 3,82MB/s in 32s \n",
"\n",
"2019-09-02 21:05:06 (3,13 MB/s) - openrefine-linux-3.2.tar.gz saved [106046195/106046195]\n",
"\n",
"Total bytes read: 125419520 (120MiB, 137MiB/s)\n"
]
}
],
"source": [
"wget https://github.com/OpenRefine/OpenRefine/releases/download/3.2/openrefine-linux-3.2.tar.gz\n",
"mkdir -p openrefine\n",
"tar -xzf openrefine-linux-3.2.tar.gz -C openrefine --strip 1 --totals\n",
"rm openrefine-linux-3.2.tar.gz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OpenRefine starten (die Adresse http://127.0.0.1:3333 öffnet sich automatisch im Browser)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"You have 15961M of free memory.\n",
"Your current configuration is set to use 1400M of memory.\n",
"OpenRefine can run better when given more memory. Read our FAQ on how to allocate more memory here:\n",
"https://github.com/OpenRefine/OpenRefine/wiki/FAQ:-Allocate-More-Memory\n",
"Starting OpenRefine at 'http://127.0.0.1:3333/'\n",
"\n",
"21:07:41.809 [ refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)\n",
"21:07:41.810 [ refine_server] refine.memory size: 1400M JVM Max heap: 1407188992 (1ms)\n",
"21:07:41.819 [ refine_server] Initializing context: '/' from '/home/felix/notebooks/openrefine/webapp' (9ms)\n",
"SLF4J: Class path contains multiple SLF4J bindings.\n",
"SLF4J: Found binding in [jar:file:/home/felix/notebooks/openrefine/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
"SLF4J: Found binding in [jar:file:/home/felix/notebooks/openrefine/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
"SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
"SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
"21:07:42.268 [ refine] Starting OpenRefine 3.2 [55c921b]... (449ms)\n",
"21:07:42.268 [ refine] initializing FileProjectManager with dir (0ms)\n",
"21:07:42.268 [ refine] /home/felix/.local/share/openrefine (0ms)\n",
"21:07:47.059 [ refine] POST /command/core/load-language (4791ms)\n",
"21:07:47.083 [ refine] GET /command/core/get-preference (24ms)\n",
"21:07:47.096 [ refine] POST /command/core/load-language (13ms)\n",
"21:07:47.104 [ refine] POST /command/core/load-language (8ms)\n",
"21:07:47.188 [ refine] POST /command/core/get-importing-configuration (84ms)\n",
"21:07:47.216 [ refine] GET /command/core/get-all-project-tags (28ms)\n",
"21:07:47.229 [ refine] GET /command/core/get-all-project-metadata (13ms)\n",
"21:07:47.348 [ refine] GET /command/core/get-languages (119ms)\n",
"21:07:47.447 [ refine] GET /command/database/saved-connection (99ms)\n",
"21:07:47.489 [ refine] GET /command/core/get-version (42ms)\n"
]
}
],
"source": [
"openrefine/refine"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Bash",
"language": "bash",
"name": "bash"
},
"language_info": {
"codemirror_mode": "shell",
"file_extension": ".sh",
"mimetype": "text/x-sh",
"name": "bash"
}
},
"nbformat": 4,
"nbformat_minor": 2
}