Search code examples
vhdlveriloghdlsynthesis

Synthesis global instance count


I couldn't find any questions related to this, but it is possible that I just don't know what to search for. When using a synthesis tool (let's say Synplify if you need a specific tool but it would be best if there were a standard compliant version that worked across tools), is it possible to keep track of the number of instances of a module and help this guide the synthesis? I suspect not, but I can see many use-cases for something like this. Let me give some examples of what I mean.

Some context, I am writing this with FPGA development in mind, but I bet it would have uses for ASIC designs as well.

Let's say that I have 10 multipliers on a device and I have some operation (like a complex multiplier) that I want to instantiate in many locations (not just in a generate loop, but all over the design). Let's say that I have one implementation using the dedicated multipliers for this function, but I also have a complex multiplier that uses the fabric. I would like my complex multipliers to exhaust the dedicated multipliers before going over to the fabric implementation.

Is it possible to instantiate the complex multipliers through a wrapper, and every time this wrapper is instantiated during elaboration, a global instance count is incremented so I can keep track of how many multipliers have been used? Furthermore, can I then use this global variable in a generate-if statement or some other construct to make a decision between the two implementations based on the number of instances of the multiplier module that have been used?

I'm using the multiplier as an example. I realize that I could just infer the multiplier to obtain this behavior. I imagine designs where I might want to infer different filter structures (for example, a tapped delay line FIR filter versus a distributed arithmetic FIR filter) based on the current value of this global variable. This would certainly help when porting code across to new FPGAs.

I've been looking for something like this for a while, but I suspect it doesn't exist. I realize I can do something very close to what I want by simply designing the architecture of my system the right way. The intention here is more about automating that process in my design so that future alterations to my design don't require me to refactor the whole system layout (unless timing or resource limitations come into play). I also see this as a means of helping me to keep my code portable among devices in the same family with very different resource allocations (this one has more DSP slices, that one has more LUTs, etc.).

If this only exists in one tool or one language, that would still be an acceptable solution to me. If you can provide definitive proof or a logical reason why it can't work, that would also be an acceptable solution to me.


Solution

  • Have you actually tried implementing a design that infers more multipliers than there are in the device? I would expect the tool to automatically start using LUT resource once it runs out of DSP blocks. However, assuming this does not happen:

    You can achieve this to an extent using Xilinx Vivado and a custom TCL script. I won't detail the exact commands and script, as this would be time consuming to get right, but the basic flow would look like this:

    1. Add generic parameters to the entities that implement multipliers. Tie these through to generic parameters on the top level of the design.
    2. Script sets an initial set of generic parameters that cause every entity to use dedicated DSP blocks.
    3. Script runs synth_design with the -generic switch controlling the DSP block usage.
    4. After synthesis completes, script parses output of report_utilization to determine if the number of DSP blocks was exceeded. If not, go to step 5. If yes, modify the set of generic parameters such that more entities use an alternative multiplier implementation, and go back to step 3.
    5. Synthesized design fits in device, script proceeds with implementation steps.

    An alternative to the above process without using generics would be to keep the same basic steps, but use the set_property command on specific multiplier instances in order to control their implementation, instead of setting generics.