Search code examples
pythongoogle-app-engineentitiestask-queuedel

taskqueue for deleting entities fail - python - flask - google appengine


I don't know what I'm doing.

That's why my google appengine taskqueue isn't working.

The goal is to simply delete a million billion entities upon request ... okay.. not that many. I don't really want to succum to using mapreduce and I don't think I'll need too. Still pretty sure that a request without a task queue implementation will time out. Hence a taskqueue.

In my case, a taskqueue that fails. Like this:

INFO 2018-01-28 18:48:21,129 module.py:788] default: "POST /del/text HTTP/1.1" 302 - WARNING 2018-01-28 18:48:21,129 taskqueue_stub.py:1981] Task task5 failed to execute. This task will retry in 3600.000 seconds

Here is my wonderful failure code:

@sign_in_
@app.route('/del/<entity_kind>', methods=["GET", "POST"])
def DeleteStuff(entity_kind):

  allowed_deletion = {
    'text': models.Text,
  }

  cursor = None
  bookmark = request.form.get('bookmark', '')
  if bookmark:
    cursor = ndb.Cursor.from_websafe_string(bookmark)

  query = allowed_deletion[entity_kind].query()
  entries, next_cursor, more = query.fetch_page(
    1000, 
    keys_only=True,
    start_cursor=cursor)

  ndb.delete_multi(entries)

  bookmark = None
  if more:
    bookmark = next_cursor.to_websafe_string()

  taskqueue.add(
    url='/del/'+entity_kind,
    params={'bookmark': bookmark}
    )
  return "{0} deleted".format(entity_kind)

However, I can't even get a task with just a return statement in it to execute:

in app.yaml

- url: /del/*
  script: app.app
  login: admin

"trigger" handler tied to app.yaml:

@sign_in_
@app.route('/del/<entity_kind>', methods=["GET", "POST"])
def DeleteStuff(entity_kind):

  allowed_deletion = {
    'text': models.Text,
    'call': models.Call,
    'voicemail': models.Voicemail,
    'image': models.Image,
    'email': models.Email,
  }


  taskqueue.add(
    url='/deleting/'+entity_kind,
    #params={'bookmark': bookmark}
    )
  return "{0} to be deleted".format(entity_kind)

worker.yaml:

runtime: python27
api_version: 1
threadsafe: true
service: worker

handlers:
- url: /deleting/.*
  script: worker.app
  login: admin

worker.py:

from flask import Flask, render_template, request, redirect, url_for, abort, make_response
from google.appengine.ext import ndb
import datetime


from app import app
from app import models

@app.route('/deleting/<entity_kind>',methods=['POST'])
def DeletingStuff(entity_kind):

  print "entered task queue"

  allowed_deletion = {
    'text': models.Text,
    'call': models.Call,
    'voicemail': models.Voicemail,
    'image': models.Image,
    'email': models.Email,
  }
  return

app structure

app (folder)
    - __init__.py
    - app.py
    - worker.py
    - worker.yaml
    - app.yaml

maybe worker should be in it's own folder? I wouldn't know how to hook that up in flask though...

The print statement in the taskqueue isn't even firing.

all I'm getting in the terminal is stuff that says

INFO 2018-01-28 22:11:21,432 module.py:788] default: "POST /deleting/image HTTP/1.1" 302 - WARNING 2018-01-28 22:11:21,432 taskqueue_stub.py:1981] Task task25 failed to execute. This task will retry in 409.600 seconds

Finally, a working solution

@sign_in_
@app.route('/del/<entity_type>', methods=["GET", "POST"])
def DeleteStuff(entity_type):

  taskqueue.add(
    url    = "/execute_task",
    method = "POST",
    params = {
      "entity_type": entity_type,
    }
  ) # from 'from google.appengine.api import taskqueue'
  return Response("sending {} task to queue... Check logs".format(entity_type), mimetype='text/plain', status=200)#Response from 'from Flask import Response'


@sign_in_
@app.route('/execute_task', methods=["GET", "POST"])
def execute_task():
  allowed_deletion = {
    'a': models.SomemodelA,
    'b': models.SomemodelB,
    'c': models.SomemodelC,
    'd': models.SomemodelD,
  }
  entity_type = request.form.get("entity_type", '')
  bookmark = request.form.get('bookmark', '')

  cursor = None
  if bookmark:
    cursor = ndb.Cursor.from_websafe_string(bookmark)

  if entity_type:
    query = allowed_deletion[entity_type].query()
    entries, next_cursor, more = query.fetch_page(
      1000, 
      keys_only=True,
      start_cursor=cursor)

    ndb.delete_multi(entries)

    bookmark = None
    if more:
      bookmark = next_cursor.to_websafe_string()

      taskqueue.add(
        url='/execute_task',
        params={'entity_type': entity_type, 'bookmark': bookmark}
      )

  return Response("did it. You'll never see this message", mimetype='text/plain', status=200)

notes

@sign_in_ can be on the task as well without throwing errors

Google app engine throws failure errors with variable urls for task-queues, in other words this will throw failure errors:

@app.route('/execute_task/<some_variable>', methods=["GET", "POST"])
def execute_task(some_variable):
    pass

both urls are listed in app.yaml, no need for worker.py and worker.yaml

- url: /del/*
  script: app.app
  login: admin
- url: /execute_task
  script: app.app
  login: admin

Many thanks to GAEfan for helping!! :) Also Note: I was able to debug my way to the solution starting over from his code given in another post of mine: simple google app engine taskqueue tutoral? flask / python / GAE


Solution

  • Start with:

    @app.route('/del/<entity_kind>', methods = ['GET', 'POST'])
    

    Next, request.args is for a query string. You want to handle the POST params:

    request.form.get('bookmark', '')
    

    Next, your @sign_in_ tag is causing a redirect for the task queue to a signin page. (See the 302 referenced in the error log?). Remove that. Try adding this to your app.yaml:

    - url: /del/.*
      script: whereisapplication.app
      login: admin
    

    That will put it behind your Google login, but the task queue still can access.

    Update to your update:

    A print statement from a task is meaningless; You will never see it. Try:

    import logging
    logging.info("{0} to be deleted".format(entity_kind))
    

    or:

    @app.route('/deleting/<entity_kind>/',methods=['GET', 'POST'])
    @app.route('/deleting/<entity_kind>',methods=['GET', 'POST'])
    def DeletingStuff(entity_kind):
        if request.method == 'GET':
            print "here"
    

    and see if you can access the url in the browser. Note trailing slash in 1st route: to help diagnose the redirect issue.

    You could also skip the POST method completely, and use a query string:

    /deleting/text?cursor=387123481246123469

    cursor = request.args.get("cursor", None)