Search code examples
stringlogicvhdlram

VHDL - String indexing - RAM usage and total logic elements increase by over 100% each


I'm hoping someone with more VHDL experience can enlighten me! To summarise, I have an LCD entity and a Main entity which instantiates it. The LCD takes an 84-character wide string ("msg"), which seems to cause me huge problems as soon as I index it using a variable or signal. I have no idea what the reason for this is, however, since the string is displaying HEX values, and each clock cycle, I read a 16-bit value... I need to update 4 characters of the string for each nybble of this 16-bit value. This doesn't need to be done in a single clock cycle, since a new value is read after a large number of cycles... however, experimenting with incrementing a "t" variable, and only changing string values one "t" at a time makes no difference for whatever reason.

The error is: "Error (170048): Selected device has 26 RAM location(s) of type M4K" However, the current design needs more than 26 to successfully fit

Here is the compilation report with the problem:

Flow Status Flow Failed - Tue Aug 08 18:49:21 2017
Quartus II 64-Bit Version   13.0.1 Build 232 06/12/2013 SP 1 SJ Web Edition
Revision Name   Revision1
Top-level Entity Name   Main
Family  Cyclone II
Device  EP2C5T144C6
Timing Models   Final
Total logic elements    6,626 / 4,608 ( 144 % )
Total combinational functions   6,190 / 4,608 ( 134 % )
Dedicated logic registers   1,632 / 4,608 ( 35 % )
Total registers 1632
Total pins  50 / 89 ( 56 % )
Total virtual pins  0
Total memory bits   124,032 / 119,808 ( 104 % )
Embedded Multiplier 9-bit elements  0 / 26 ( 0 % )
Total PLLs  1 / 2 ( 50 % )

The RAM summary table contains 57 rows, of "LCD:display|altsyncram:Mux####_rtl_0|altsyncram_####:auto_generated|ALTSYNCRAM"

Here is the LCD entity:

entity LCD is
  generic(
      delay_time : integer := 50000;
      half_period : integer := 7
  );
  port(
      clk        : in   std_logic;
      SCE  : out std_logic := '1';
      DC   : out std_logic := '1';
      RES  : out std_logic := '0';
      SCLK : out std_logic := '1';
      SDIN : out std_logic := '0';
      op   : in std_logic_vector(2 downto 0);
      msg  : in string(1 to 84);
      jx : in integer range 0 to 255 := 0;
      jy : in integer range 0 to 255 := 0;
      cx : in integer range 0 to 255 := 0;
      cy : in integer range 0 to 255 := 0
  );
end entity;

The following code is what causes the problem, where a, b, c and d are variables which are incremented by 4 after each read:

msg(a) <= getHex(data(3 downto 0));
msg(b) <= getHex(data(7 downto 4));
msg(c) <= getHex(data(11 downto 8));
msg(d) <= getHex(data(15 downto 12));

Removing some of these lines causes the memory and logic element usages to both drop, but they still seem absurdly high, and I don't understand the cause. Replacing a, b, c and d with integers, like 1, 2, 3 and 4 causes the problem to go away completely, with the logic elements at 22%, and RAM usage at 0%!

If anybody has any ideas at all, I'd be very grateful! I will post the full code below in case anybody needs it... but be warned, it's a bit messy, and I feel like the problem could be simple. Many thanks in advance!

Main.vhd LCD.vhd


Solution

  • There are a few issues here.

    The first is that HDL synthesis tools do an awful lot of optimization. What this basically means is if you don't properly connect up input and output parts to/from something it is likely (but not certain) to get eliminated by the optimizer.

    The second is you have to be very careful with loops and functions. Basically loops will be unrolled and functions will be inlined, so a small ammount of code can generate an awful lot of logic.

    The third is that under some cicumstances arrays will be translated to memory elements.

    As pointed out in a comment this loop is the root cause of the large ammounts of memory usage.

    for j in 0 to 83 loop
         for i in 0 to 5 loop
             pixels((j*6) + i) <= getByte(msg(j+1), i);
         end loop;
    end loop;
    

    This has the potential to use a hell of a lot of memory resources. Each call to "getByte" requires a read port on (parts of) "ram" but blockrams only have two read ports. So "ram" gets duplicated to satisfy the need for more read ports. The inner loop is reading different parts of the same location so basically each iteration of the outer loop needs an independent read port on the ram. So that is about 40 copies of the ram. Reading the cyclone 2 datasheet each copy will require 2 m4k blocks

    So why doesn't this happen when you use numbers instead of the variables a,b,c and d?

    If the compiler can figure out something is a constant it can compute it at compile time. This would limit the number of calls to "pixels" that have to actually be translated to memory blocks rather that just having their result hardcoded. Still i'm surprised it's dropping to zero.

    I notice your code doesn't actually have any inputs other than the clock and a "rx" input that doesn't actually seem to be being used for anything, so it is quite possible that the synthesizer may be figuring out a hell of a lot of stuff at build time. Often eliminating one bit of code can allow another bit to be eliminated until you have nothing left.