Search code examples
pythondjangoapikaggle

Kaggle Dataset command returning wrong data


I am trying to download dataset from Kaggle through my django app. In my utils, I have this code:

def search_kaggle(search_term):
    search_results = os.popen("kaggle datasets list -s "+search_term).read().splitlines()

    return search_results

On my view function, I have this:

def search_dataset(request):
    context = {
            
        }
    print('search dataset reached')
    if request.method == "POST":
        searchkey = request.POST["searchkey"]
        dtsite = request.POST["dtsite"]
        dtsnum = request.POST["dtsnum"]
        if searchkey != "":
            if dtsite == "kaggle":
                results = search_kaggle(dtsite)
                context['results'] = results
                print("Kaggle reached")
            if dtsite == "datagov":
                print("datagov")
            if dtsite == "uci":
                print("UCI")
            if dtsite == "googlepd":
                print("googlepd")
        else:
            messages.error(request, " You must select a search keyword!")
        
    

    return render(request, 'datasetsearch/dataset_results.html', context)

When I run the code, it actually returns some data from Kaggle but this data is totally different from what I get when I run the same command in CLI using:

kaggle datasets list -s 'fraud detection'

In the code above, the search_term = 'fraud detection' so I believe it should return the same form of data but I am getting something different. The result of the command line is the correct result.

See Command line result

ref                                                               title                                                size  lastUpdated          downloadCount  v
----------------------------------------------------------------  --------------------------------------------------  -----  -------------------  -------------  -
mlg-ulb/creditcardfraud                                           Credit Card Fraud Detection                          66MB  2018-03-23 01:17:27         430457   
ealaxi/paysim1                                                    Synthetic Financial Datasets For Fraud Detection    178MB  2017-04-03 08:40:34          55698   
mishra5001/credit-card                                            Credit Card Fraud Detection                         112MB  2019-07-15 06:36:02           8706   
kartik2112/fraud-detection                                        Credit Card Transactions Fraud Detection Dataset    202MB  2020-08-05 15:20:55          13158   
rohitrox/healthcare-provider-fraud-detection-analysis              HEALTHCARE PROVIDER FRAUD DETECTION ANALYSIS        25MB  2019-05-09 19:50:55          11674   
rupakroy/online-payments-fraud-detection-dataset                  Online Payments Fraud Detection Dataset             178MB  2022-04-17 15:34:44           3985   
vagifa/ethereum-frauddetection-dataset                            Ethereum Fraud Detection Dataset                    923KB  2021-01-03 10:05:14           1418   
shayannaveed/credit-card-fraud-detection                          Credit Card Fraud Detection                          66MB  2019-12-24 08:07:24           1233   
shivamb/vehicle-claim-fraud-detection                             Vehicle Insurance Claim Fraud Detection             348KB  2021-12-20 04:26:36           2325   
saurabhbagchi/credit-card-fraud-detection                         Credit Card Fraud Detection                          28MB  2021-07-18 14:27:20            909   
volodymyrgavrysh/fraud-detection-bank-dataset-20k-records-binary  Fraud detection bank dataset 20K records binary     738KB  2021-08-08 15:12:01           2184   
isaikumar/creditcardfraud                                         Credit Card Fraud Detection Dataset                  66MB  2018-05-05 09:38:01           4386   
gopalmahadevan/fraud-detection-example                            Fraud Detection Example                               3MB  2021-08-01 02:31:29            652   
tanisha1416/promo-abuse-detection-for-payment-apps                Promo Code Abuse Detection (Fraud Detection)         25KB  2021-08-07 07:13:13            208   
ealtman2019/credit-card-transactions                              Credit Card Transactions                            263MB  2021-10-14 17:42:24           2542   
ealaxi/banksim1                                                   Synthetic data from a financial payment system       13MB  2017-07-11 14:48:56          23766   
dhanushnarayananr/credit-card-fraud                               Credit Card Fraud                                    29MB  2022-05-07 15:09:29           2833   
muhakabartay/yourallmodelsdata                                    IEEE-CIS Fraud Detection Models Data                 28MB  2019-09-18 07:57:04            125   
dileep070/anomaly-detection                                       Credit card fraud detection                          43MB  2019-06-19 06:00:05            962   
mrmorj/fraud-detection-in-electricity-and-gas-consumption         Fraud Detection in Electricity and Gas Consumption   87MB  2020-08-24 12:29:16           1205 

See the python script result:

ref title size lastUpdated downloadCount voteCount usabilityRating

------------------------------------- -------------------------------------------------- ----- ------------------- ------------- --------- ---------------

kaggle/meta-kaggle Meta Kaggle 6GB 2022-08-01 06:39:59 10828 653 0.7647059

kaggle/kaggle-survey-2018 2018 Kaggle Machine Learning & Data Science Survey 4MB 2018-11-03 22:35:07 17710 1008 0.85294116

kaggle/world-development-indicators World Development Indicators 369MB 2017-05-01 17:50:44 62053 1604 0.7647059

kaggle/kaggle-survey-2017 2017 Kaggle Machine Learning & Data Science Survey 4MB 2017-10-27 22:03:03 25672 854 0.8235294

kaggle/sf-salaries SF Salaries 11MB 2019-12-05 23:30:07 54209 713 0.7058824

alsgroup/end-als End ALS Kaggle Challenge 12GB 2021-04-08 12:16:37 1485 177 0.9375

kaggle/hillary-clinton-emails Hillary Clinton's Emails 12MB 2019-11-14 05:31:24 17379 288 0.7058824

kaggle/college-scorecard US Dept of Education: College Scorecard 562MB 2017-11-09 18:03:11 14214 214 0.7647059

kaggle/recipe-ingredients-dataset Recipe Ingredients Dataset 2MB 2017-01-19 02:55:45 11082 195 0.75

kaggle/reddit-comments-may-2015 May 2015 Reddit Comments 20GB 2019-06-04 10:06:44 9124 280 0.64705884

kaggle/us-baby-names US Baby Names 173MB 2017-11-21 22:18:15 29489 320 0.5882353

morriswongch/kaggle-datasets Kaggle Datasets 3MB 2018-12-02 03:50:47 1819 72 0.8235294

kaggle/us-consumer-finance-complaints US Consumer Finance Complaints 84MB 2019-11-14 05:52:29 17837 286 0.5882353

pavlofesenko/titanic-extended Titanic extended dataset (Kaggle + Wikipedia) 134KB 2019-03-06 09:53:24 9419 133 0.9411765

canggih/voted-kaggle-dataset Upvoted Kaggle Datasets 1MB 2018-02-26 10:10:34 1268 33 1.0

canggih/upvoted-kaggle-kernels Upvoted Kaggle Kernels 115KB 2018-02-26 16:52:28 207 27 1.0

jessevent/all-kaggle-datasets Complete Kaggle Datasets Collection 390KB 2018-01-16 12:32:58 2099 109 0.8235294

kaggle/no-data-sources No Data Sources 159B 2017-04-12 20:45:12 1144 139 0.4375

kaggle/kaggle-blog-winners-posts Kaggle Blog: Winners' Posts 519KB 2016-09-21 02:21:21 766 43 0.7058824

kaggle/2015-notebook-ux-survey 2015 Notebook UX Survey 198KB 2017-05-01 17:56:25 1033 49 0.64705884   

Solution

  • You are not passing the search term to the function call search_kaggle(); but the string kaggle via variable dtsite:

    if dtsite == "kaggle":
       results = search_kaggle(dtsite)
    

    Change this to:

    if dtsite == "kaggle":
       results = search_kaggle(searchkey)