How can I make this Java delay webservice call code use less cpu?

I have been doing some cpu profiling of my application, and I note that one of the things that takes a significant amount of time is the code that ensures I send mo more than query to webservice per second. The actual query itself and handling of the results take little time in comparison, of course there is an I/O component waiting for results but they thing I am trying to do is reduce cpu since the applications sometimes has to run on a single cpu machine

Using YourKit Profiler the call that uses the significant amount of cpu is

 java.util.concurrent.locks.AbstractQueuedSynchronizer.aquireQueued()

My delay method is below

    public class SearchServer
    {

        private static java.util.concurrent.locks.Lock delayLock = new ReentrantLock();
        private static AtomicInteger queryQueue = new AtomicInteger();
        private static AtomicLong queryDelay = new AtomicLong();

        static void doDelayQuery()
        {
            delayLock.lock();
            try
            {
                if(isUserCancelled())
                {
                    return;
                }
                //Ensure only send one query a second
                Date currentDate = new Date();
                long delay = currentDate.getTime() - querySentDate.getTime();
                if (delay < delayInMilliseconds)
                {
                    try
                    {
                        long delayBy = delayInMilliseconds - delay;
                        queryDelay.addAndGet(delayBy);
                        Thread.sleep(delayBy);
                        logger.info(Thread.currentThread().getName() + ":Delaying for " + delayBy + " ms");
                    }
                    catch (InterruptedException ie)
                    {
                        Thread.currentThread().interrupt();
                        throw new UserCancelException("User Cancelled whilst thread was delay sleeping");
                    }
                }
            }
            finally
            {
                //We set before unlocking so that if another thread enters this method before we start query we ensure they
                //do not skip delay just because the query that this thread has delayed for has started
                querySentDate = new Date();
                delayLock.unlock();
            }

        }
    }

Solution

Okay using Google Guava Library it turned out to be suprisingly simple

import com.google.common.util.concurrent.RateLimiter;
public class SearchServer
{
     private static RateLimiter rateLimiter = RateLimiter.create(1.0d);

     static void doDelayQuery()
     {
        rateLimiter.acquire();
     }

     public doQuery()
     ..................
}

Although key difference is previously is I took the time of the previous call so didn't wait full second between calls, so to get similar throughput I changed RateLmiter to use 2.0d

Profiling no longer shows cpu hit in this area.