VHDL: big slv array slicing indexed by integer (big mux)

I want to slice a std_logic_vector in VHDL obtaining parts of it of fixed dimensions.

The general problem is:

din  N*M bits 
dout M bits 
sel  clog2(N) bits

Expected behaviour in an example (pseudocode): input 16 bit, want to slice it in 4 subvectors of 4bit each.

signal in: std_logic_vector(N*M-1 downto 0);
 signal sel: integer;
 --  with sel = 0 
output <= in(N-1:0); 
--with sel = 1 output <= in(2N-1:N)
 -- with sel = 2
 output <= in(3N-1:2N)
 ..... 
--with sel = M-1
 output <= in(M*N-1:(M-1)N)

I know a couples of way to do this, but I don't know which one is the best practice and give the best results in synthesis. the entity

din: in std_logic_vector(15 downto 0);
dout: out std_logic_vector(3 downto 0);
sel: in std_logic_vecotor(1 downto 0)

CASE STATEMENT

case sel is
     when "00" => dout <= din(3:0);
     when "01" => dout <= din(7:4);
     when "10" => dout <= din(11:8);
     when "11" => dout <= din(15:12);
     when others => ....`

It clearly implement a mux, but it's not generic at all and If the input gets big it's really hard to write and to codecover.

INTEGER INDEXING

sel_int <= to_integer(unsigned(sel)); 
dout <= din(4*(sel_int+1) - 1 downto 4*sel_int);

Extremely easy to write and to mantain, BUT it can have problems when the input is not a power of 2. For example, if I want to slice a 24bit vector in chunks of 4, what happen when the integer conversion of sel brings to the index 7?

A STRANGE TRADEOFF

sel_int <= to_integer(unsigned(sel)); 
for i in 0 to 4 generate    
    din_slice(i) <= din(4*(i+1)-1 downto 4*i); 
end generate dout <= din_slice(sel_int);

I'm searching a solution that is general enough to be used with various input/output relationships and safe enough to be synthesized consistently everytime. The Case statement is the only one with the Others case (that feels really safe), the other solutions rely on the slv to integer conversion and indexing that feels really comfortable but not so reliable.

Which solution would you use?

practical usecase I have a 250bit std_logic_vector and I need to select 10 contigous bits inside of it starting from a certain point from 0 to 239. How can I do that in a way that is good for synthesis?

Solution

First you must extend the incoming data to be sure to have always as much bits as you need for connecting all multiplexer inputs (see the code below, process p_extend). This will not create any logic at synthesis. Second you must convert the resulting vector into an array, which you can access later by an index (see the code below, process p_create_array). Again this will not create any logic at synthesis. At last you must access this array by the select input signal (see the code below, process p_mux).

library ieee;
use ieee.std_logic_1164.all;
entity mux is
    generic (
        g_data_width  : natural := 250;
        g_slice_width : natural := 10;
        g_sel_width   : natural := 5;
        g_start_point : natural := 27
    );
    port (
        d_i   : in  std_logic_vector(g_data_width-1 downto 0);
        sel_i : in  std_logic_vector(g_sel_width-1 downto 0);
        d_o   : out std_logic_vector(g_slice_width-1 downto 0)
    );
end entity mux;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
architecture struct of mux is
    signal data : std_logic_vector(g_slice_width * 2**g_sel_width-1 downto 0);
    type t_std_logic_slice_array is array (natural range <>) of std_logic_vector(g_slice_width-1 downto 0);
    signal mux_in : t_std_logic_slice_array (2**g_sel_width-1 downto 0);
begin
    p_extend: process(d_i)
    begin
        for i in 0 to g_slice_width * 2**g_sel_width-1 loop
            if i+g_start_point<g_data_width then
                data(i) <= d_i(i+g_start_point);
            else
                data(i) <= '0';
            end if;
        end loop;
    end process;
    p_create_array: process (data)
    begin
        for i in 0 to 2**g_sel_width-1 loop
            mux_in(i) <= data((i+1)*g_slice_width-1 downto i*g_slice_width);
        end loop;
    end process;
    p_mux: d_o <= mux_in(to_integer(unsigned(sel_i)));
end architecture;