Search code examples
phplaravellaravel-5bigdata

How to optimise handle of big data on laravel?


My task is: "To take transactions table, grouped row by transaction date and calculate statuses. This manipulations will be formed statistics, wich will be rendered on the page".

This is my method of this statistics generation

public static function getStatistics(Website $website = null)
{
    if($website == null) return [];

    $query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();

    $transitions = collect(static::convertDate($query))->groupBy("dt");
    $statistics = collect();

    dd($transitions);

    foreach ($transitions as $date => $trans) {
        $subscriptions = $trans->where("status", 'subscribe')->count();
        $unsubscriptions = $trans->where("status", 'unsubscribe')->count();
        $prolongations = $trans->where("status", 'rebilling')->count();
        $redirections = $trans->where("status", 'redirect_to_lp')->count();
        $conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
        $earnings = $trans->sum("pay");

        $statistics->push((object)[
            "date" => $date,
            "subscriptions" => $subscriptions,
            'unsubscriptions' => $unsubscriptions,
            'prolongations' => $prolongations,
            'redirections' => $redirections,
            'conversion' => round($conversion, 2),
            'earnings' => $earnings,
        ]);

    }

    return $statistics;
}

if count of transaction rows below 100,000 - it's all wright. But, if count is above 150-200k - nginx throw 502 bad gateway. What can you advise to me? I'm don't have any expierince in bigdata handling. May be, my impiments has fundamental error?


Solution

  • After several days of researching information on this question, I found the right answer:

    NOT to use PHP for handling raw data. It's better to use SQL!

    In my case, we are using PostgreSQL.

    Below, i'll write sql-query which worked for me, maybe it will help someone else.

    WITH
            cte_range(dt) AS
            (
                SELECT
                    generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
            ),
    
            cte_data AS
            (
                SELECT
                    date_trunc('day', dt) AS dt,
                    COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
                    COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
                    COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
                    COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
                    SUM(pay) AS earnings,
                    CASE
                        WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
                        ELSE 0
                    END
                    AS conversion_percent
    
                FROM
                    transactions
    
                WHERE
                    website_id = {$website->id}
    
                GROUP BY
                    date_trunc('day', dt)
            )
    
            SELECT
                to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
                COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
                COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
                COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
                COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
                COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
                COALESCE(cte_data.earnings, 0) AS earnings
    
            FROM
                cte_range
    
            LEFT JOIN
                cte_data
                ON cte_data.dt = cte_range.dt
    
            ORDER BY
                cte_range.dt DESC