Search code examples
pythonkaggle

How to ensure python binaries are on your path?


I am trying to use the Kaggle api. I have downloaded kaggle using pip and moved kaggle.json to ~/.kaggle, but I haven't been able to run kaggle on Command Prompt. It was not recognized. I suspect it is because I have not accomplished the step "ensure python binaries are on your path", but honestly I am not sure what it means. Here is the error message when I try to download a dataset:

>>> sys.version
'3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)]'
>>> import kaggle
>>> kaggle datasets list -s demographics
  File "<stdin>", line 1
    kaggle datasets list -s demographics
           ^
SyntaxError: invalid syntax


Solution

  • kaggle is python module but it should also install script with the same name kaggle which you can run in console/terminal/powershell/cmd.exe as

    kaggle datasets list -s demographics
    

    but this is NOT code which you can run in Python Shell or in Python script.

    If you find this script kaggle and open it in editor then you can see it imports main from kaggle.cli and it runs main()

    And this can be used in own script as

    import sys
    from kaggle.cli import main
    
    sys.argv += ['datasets', 'list', '-s', 'demographics']
    main()
    

    But this method sends results directly on screen/console and it would need assign own class to sys.stdout to catch this text in variable.

    Something like this:

    import sys
    import kaggle.cli
    
    class Catcher():
        def __init__(self):
            self.text = ''
            
        def write(self, text):
            self.text += text
     
        def close(self):
            pass
        
    catcher = Catcher()    
    
    old_stdout = sys.stdout  # keep old stdout
    sys.stdout = catcher     # assing new class
    
    sys.argv += ['datasets', 'list', '-s', 'demographics']
    result = kaggle.cli.main()
    
    sys.stdout = old_stdout  # assign back old stdout (because it is needed to run correctly `print()`
    
    print(catcher.text)
    

    Digging in source code on script kaggle I see you can do the same using

    import kaggle.api
    
    kaggle.api.dataset_list_cli(search='demographics')
    

    but this also send all directly on screen/console.


    EDIT:

    You can get result as list of special objects which you can later use with for-loop

    import kaggle.api
    
    result = kaggle.api.dataset_list(search='demographics')
                                     
    for item in result:
        print('title:', item.title)
        print('size:', item.size)
        print('last updated:', item.lastUpdated)
        print('download count:', item.downloadCount)
        print('vote count:', item.voteCount)
        print('usability rating:', item.usabilityRating)
        print('---')