Fast solution to Subset sum

Consider this way of solving the Subset sum problem:

def subset_summing_to_zero (activities):
  subsets = {0: []}
  for (activity, cost) in activities.iteritems():
      old_subsets = subsets
      subsets = {}
      for (prev_sum, subset) in old_subsets.iteritems():
          subsets[prev_sum] = subset
          new_sum = prev_sum + cost
          new_subset = subset + [activity]
          if 0 == new_sum:
              new_subset.sort()
              return new_subset
          else:
              subsets[new_sum] = new_subset
  return []

I have it from here:

http://news.ycombinator.com/item?id=2267392

There is also a comment which says that it is possible to make it "more efficient".

How?

Also, are there any other ways to solve the problem which are at least as fast as the one above?

Edit

I'm interested in any kind of idea which would lead to speed-up. I found:

https://en.wikipedia.org/wiki/Subset_sum_problem#cite_note-Pisinger09-2

which mentions a linear time algorithm. But I don't have the paper, perhaps you, dear people, know how it works? An implementation perhaps? Completely different approach perhaps?

Edit 2

There is now a follow-up:
Fast solution to Subset sum algorithm by Pisinger

Solution

While my previous answer describes the polytime approximate algorithm to this problem, a request was specifically made for an implementation of Pisinger's polytime dynamic programming solution when all x_i in x are positive:

from bisect import bisect

def balsub(X,c):
    """ Simple impl. of Pisinger's generalization of KP for subset sum problems
    satisfying xi >= 0, for all xi in X. Returns the state array "st", which may
    be used to determine if an optimal solution exists to this subproblem of SSP.
    """
    if not X:
        return False
    X = sorted(X)
    n = len(X)
    b = bisect(X,c)
    r = X[-1]
    w_sum = sum(X[:b])
    stm1 = {}
    st = {}
    for u in range(c-r+1,c+1):
        stm1[u] = 0
    for u in range(c+1,c+r+1):
        stm1[u] = 1
    stm1[w_sum] = b
    for t in range(b,n+1):
        for u in range(c-r+1,c+r+1):
            st[u] = stm1[u]
        for u in range(c-r+1,c+1):
            u_tick = u + X[t-1]
            st[u_tick] = max(st[u_tick],stm1[u])
        for u in reversed(range(c+1,c+X[t-1]+1)):
            for j in reversed(range(stm1[u],st[u])):
                u_tick = u - X[j-1]
                st[u_tick] = max(st[u_tick],j)
    return st

Wow, that was headache-inducing. This needs proofreading, because, while it implements balsub, I can't define the right comparator to determine if the optimal solution to this subproblem of SSP exists.