In Pandas, how do I bin floats and count string types from other cols where the bin is of a rolling, overlapping numeric range of constant size?

I have a table of float values and strings such as the following:

FltVal	Category
0.01	A
0.02	A
0.05	B
0.31	B
0.36	A
0.37	C
0.41	B

I would like to produce a new dataframe that bins the float values into bins of size 0.3 (so each bin is (x, x + 0.3]) and that overlaps in 0.01 increments. Additionally I'd like a column tallying the number of records following into the bin, as well as a column tallying each 'Category' total for the bin.

I first tried to get a basic count going with groupby and pd.cut (before trying to find counts for each Category):

import pandas as pd

floats = pd.Series([0.01,0.02,0.05,0.31,0.36,0.37,0.41])
categories = pd.Series(['A','A','B','B','A','C','B'])
data = {"FltVal": floats, "Category": categories}
df = pd.concat(data, axis=1)

grouped_vals = df.groupby(pd.cut(df['FltVal'],np.arange(df['FltVal'].min(),df['FltVal'].max(),0.3))).count()

Output:

FltVal  Category
FltVal  
(0.01, 0.31]       3         3

I think the problem there was that it couldn't consider another bin, let alone overlapping?

Then, with the idea I could just throw out non-sensible (like negatives) ranges afterward, I tried the following:

FltVal_range = np.arange(df['FltVal'].min(),df['FltVal'].max(),0.01)
FltVal_range_from = FltVal_range - 0.3
FltVal_range_to = FltVal_range
FltVal_intervals = pd.IntervalIndex.from_arrays(FltVal_range_from,FltVal_range_to)
binned_df = df.groupby(pd.cut(df['FltVal'], FltVal_intervals))

But got the ValueError message:

Overlapping IntervalIndex is not accepted.

If pd.cut doesn't accept overlapping bins, I guess it's a non-starter.

Regardless, desired output is the following:

FloatBin	Count	Cat_A	Cat_B	Cat_C
0.00, 0.30	3	2	1	0
0.01, 0.31	3	1	2	0
0.02, 0.32	2	0	2	0
0.03, 0.33	2	0	2	0
0.04, 0.34	2	0	2	0
0.05, 0.35	1	0	1	0
0.06, 0.36	2	1	1	0
0.07, 0.37	3	1	1	1
0.08, 0.38	3	1	1	1
0.09, 0.39	3	1	1	1
0.1, 0.4	3	1	1	1
0.11, 0.41	4	1	2	1

An efficient approach is necessary as the actual dataframe will have rows in the hundreds-of-thousands to millions.

I'm thinking of something similar to this answer but suitable for finding counts for all my categories.

All help is much appreciated!

Solution

One option using janitor's conditional_join:

import janitor

FltVal_range = np.arange(df['FltVal'].min(), df['FltVal'].max(), 0.01)

tmp = (pd.crosstab(df['FltVal'], df['Category'],
                   margins=True, margins_name='Count')
         .drop('Count')
       )

out = (pd.DataFrame({'low': FltVal_range, 'high': FltVal_range+0.3})
         .conditional_join(tmp.reset_index(),
                           ('low', 'FltVal', '<'),
                           ('high', 'FltVal', '>='))
         .groupby(['low', 'high'])[list(tmp)].sum()
      )

Output:

           A  B  C  Count
low  high                
0.01 0.31  1  2  0      3
0.02 0.32  0  2  0      2
0.03 0.33  0  2  0      2
0.04 0.34  0  2  0      2
0.05 0.35  0  1  0      1
0.06 0.36  1  1  0      2
0.07 0.37  1  1  1      3
0.08 0.38  1  1  1      3
0.09 0.39  1  1  1      3
0.10 0.40  1  1  1      3
0.11 0.41  1  2  1      4
0.12 0.42  1  2  1      4
0.13 0.43  1  2  1      4
0.14 0.44  1  2  1      4
0.15 0.45  1  2  1      4
...