I have a Verilog design that compiles to ~15K LEs on a Cyclone IV (EP4CE22F17C6N). When I compile the same same code on a Cyclone V (5CEFA2F23C8N), it takes ~8500 ALMs. Based on Altera's own LE equivalency for the particular Cyclone V, this would be ~20K LEs. Now, I realize that the estimates are going to be highly dependent on particular design, but a %33 increase in "effective" resource utilization seems like a lot.
So it makes me wonder if there are design tips/tricks/etc. for making more efficient use of ALMs. In particular, I'm looking for Verilog constructs that would improve the register density, fabric density, dense packing, etc.
I would agree with the comments above that generally you shouldn't need to optimise, however it's always important to check that your code does map to the chosen architecture. Specifically:
Reset
Using the wrong kind of reset for your architecture can cause problems. It's also very easy to accidentally cause the synthesis tool to insert logic to emulate a clock-enable. For full details see this answer. For Altera you should be using an asynchronous reset which is synchronously de-asserted.
Priority of control signals
In Altera:
Latches
Easy to grep from the reports, but unless you're absolutely sure it's intentional, latches are generally bad mmmmkay.
Synthesis
There are many options available to tweaking the behaviour of the synthesis process. Here are a few that will affect your results:
ALM_REGISTER_PACKING_EFFORT
This option guides the Fitter when packing registers into ALMs.
MUX_RESTRUCTURE
Allows the Compiler to reduce the number of logic elements required to implement multiplexersin a design.
OPTIMIZATION_TECHNIQUE
Specifies the overall optimization goal for Analysis & Synthesis: attempt to maximize performance, minimize logic usage, or balance high performance with minimal logic usage.
Bear in mind that if your device isn't getting too full, the tool won't have much "incentive" to minimise logic utilisation unless you explicitly tell it to.