Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port to python 3.7 and use requests #8

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wolfv
Copy link

@wolfv wolfv commented Dec 27, 2019

hey, I attempted to port the openrefine client to python3, and I have also swithced to use the requests library for communicating with the server (including uploading files).

@wolfv
Copy link
Author

wolfv commented Dec 30, 2019

let me know if you are happy (in general) with this PR, then I'll fix the codacy issues. Would be cool to move this project over to python3 as the clock is ticking for python 2 :)

@felixlohmeier
Copy link
Member

Hey @wolfv, Thank you very much for your contribution!

I did some quick tests based on https://github.com/opencultureconsulting/openrefine-client/blob/master/tests/cli_bash.ipynb and some basic functionalities seem to be broken.

[felix@tux Desktop]$ git clone -b py3_port https://github.com/wolfv/openrefine-client.git
Cloning into 'openrefine-client'...
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 1201 (delta 0), reused 3 (delta 0), pack-reused 1189
Receiving objects: 100% (1201/1201), 2.52 MiB | 1.55 MiB/s, done.
Resolving deltas: 100% (630/630), done.
[felix@tux Desktop]$ cd openrefine-client/
[felix@tux openrefine-client]$ python3 refine.py --download "https://git.io/fj5hF" --output=duplicates.csv
Traceback (most recent call last):
  File "refine.py", line 35, in <module>
    __main__.main()
  File "/home/felix/Desktop/openrefine-client/google/refine/__main__.py", line 239, in main
    cli.download(options.download, output_file=options.output)
  File "/home/felix/Desktop/openrefine-client/google/refine/cli.py", line 161, in download
    urllib.request.urlretrieve(url, output_file, context=context)
TypeError: urlretrieve() got an unexpected keyword argument 'context'
[felix@tux openrefine-client]$ python3 refine.py --create tests/data/cli/duplicates.csv
Traceback (most recent call last):
  File "refine.py", line 35, in <module>
    __main__.main()
  File "/home/felix/Desktop/openrefine-client/google/refine/__main__.py", line 251, in main
    cli.create(options.create, **kwargs)
  File "/home/felix/Desktop/openrefine-client/google/refine/cli.py", line 127, in create
    **kwargs)
  File "/home/felix/Desktop/openrefine-client/google/refine/refine.py", line 275, in new_project
    'create-project-from-upload', options, params, files=files
TypeError: urlopen() got an unexpected keyword argument 'files'

I am on parental leave and will probably not be able to take a closer look at it until March. @boerni667 is also working on a port to Python3 (https://github.com/boerni667/openrefine-client) and may want to check it out before then.

@wolfv
Copy link
Author

wolfv commented Jan 13, 2020

Hey @felixlohmeier thanks for taking a look. Indeed, there were quite a bunch of issues left ... I attempted to fix them (the CLI at least seems to work fine with the latest changes).
There is something going on with the mode thing though... maybe there was an API change in OpenRefine (I am testing with 3.2).
I think it would also be fine / nice to drop support for older OpenRefine versions.

@felixlohmeier
Copy link
Member

Great, I will take a closer look next month.

I had started to investigate this systematically a few months ago (paulmakepeace#19), but I lacked the time to complete it. Dropping support for older versions would be a pragmatic solution.

@felixlohmeier
Copy link
Member

@boerni667 and @jnauber, maybe you would like to join this discussion?

@jnauber
Copy link

jnauber commented Jan 31, 2020

After comparing the development branches from @wolfv and @boerni667 [1] and discussing the matter with @felixlohmeier, i do think that the best way to continue the development is to merge this pull request (after successful testing through @felixlohmeier) and afterwards open a new pull request with the changes in @boerni667 fork.

What do you think about this procedure?

[1] wolfv/openrefine-client@py3_port...boerni667:master

@wolfv
Copy link
Author

wolfv commented Jan 31, 2020

sounds good to me :) I have ticked the checkbox so that other maintainers can push to my branch as well.

@felixlohmeier
Copy link
Member

Sorry for the long delay. I will check and merge the port on Python3 by end of July.

@wolfv
Copy link
Author

wolfv commented Jul 1, 2020

No worries, I am happy to see this move forward.

@felixlohmeier
Copy link
Member

I'm working on it now. Great work! So far I've noticed only two deviations. The template option "suffixById=true" returns wrong file names and the connection error is so verbose that we better intercept it. I will take care of it.

@felixlohmeier
Copy link
Member

felixlohmeier commented Aug 1, 2020

Unfortunately code freezing with pyinstaller does not work well with python3 and requests. The executable file takes almost twice as long to start up:

[felix@tux virtualbox-share]$ time ./openrefine-client_py3
Usage: openrefine-client_py3 [--help | OPTIONS]


real	0m0,467s
user	0m0,420s
sys	0m0,040s

[felix@tux virtualbox-share]$ time ./openrefine-client_0-3-8_linux
Usage: openrefine-client_0-3-8_linux [--help | OPTIONS]


real	0m0,242s
user	0m0,190s
sys	0m0,030s

If I exclude requests, then the performance is much better. I will have a look if these parts can be implemented with urllib.

[felix@tux virtualbox-share]$ time ./openrefine-client_py3_no-requests
Usage: openrefine-client_py3_no-requests [--help | OPTIONS]


real	0m0,296s
user	0m0,250s
sys	0m0,041s

@wolfv
Copy link
Author

wolfv commented Aug 1, 2020

Thanks @felixlohmeier I would additionally be interested in having openrefine and htis library on conda-forge. Maybe we can make that happen :)

@felixlohmeier
Copy link
Member

I would additionally be interested in having openrefine and this library on conda-forge.

Yeah, basically a good idea. Actually, I was planning on just updating the CLI functionality. Do you also need the other parts of the library (the "upstream way")?

@felixlohmeier
Copy link
Member

felixlohmeier commented Dec 11, 2020

Hey @wolfv, it's been almost a year since you initiated the port to Python3. Thanks again for that!

In August I had written some tests to check the functionality of the command line interface. Thereby I noticed a few bugs. However, I had bigger problems with the creation of the one-file-executables in a Python3 environment. Here is a list:

  • Importing of non-utf8-encoded files fails with UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 144: invalid start byte
  • console_script openrefine-client is broken ModuleNotFoundError: No module named 'google'
  • templating option --suffixById=true produces wrong filenames. Seems to be a minor bug that only the first letter of the field content is used.
  • import of ODS, XLS and XLSX fails against OpenRefine 2.7 with unicode errors
  • several unicode errors with one-file-executable on Windows. PyInstaller unicode workaround does not work with Python3.
  • init time for one-file-executables has doubled (see above)

Over the holidays I finally have time to work on the openrefine-client again. Actually I had hoped to finish the port to Python3 first and then make it compatible with OpenRefine 3.3+ (csrf token). But I can't do both and OpenRefine 3.3 was released almost a year ago. Therefore I want to focus on one-file-executables supporting OpenRefine 3.3+ and stay with Python2 for now.

Do you want to continue and publish the port on your own? That would be really great, because a lot of people had shown interest in it. I'm sorry I can't contribute more right now.

@datatalking
Copy link

Hello @felixlohmeier, @wolfv, @jnauber I'm arriving late to the repo but would like to help where I can to see if we can update this for Python2 and possibly migrate this to Python 3.10? with a few new resources. I will probably be asking a bunch of questions that seem obvious but with Python Rule #2 'Explicit is better than implicit'.

Sometimes I regularly have people ask for open source projects they could assist with and who doesn't like hacking API's.

@felixlohmeier
Copy link
Member

Hi @datatalking, Help is very welcome here! I look forward to your questions!

I'm currently working on a new implementation in Bash (orcli) and was actually planning to discontinue this project (openrefine-client). See also: https://forum.openrefine.org/t/orcli-yet-another-client-for-openrefine-written-in-bash/753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants