The FPGA I'm using has about 4Mb of block ram distributed amongst 232 blocks. Each block can address up to 36-bits of memory for reading and simultaneously for writing.
I would like to declare a 1,024 item array of 224-bit structures and pull the data from the blocks as time efficiently as possible. That is, I wouldn't want to just declare an array of 1024 224-bit items for fear that each instance of a structure would be stored contiguously on a block, so that it would take several sequential reads to pull it from memory when I index it. Also, it contains several variables that are more than 36-bits, so even if Impulse C broke it up into datatypes it would still take 2 cycles to pull from the block memory.
So my question is, how efficently is this handled by Impulse C if I just plain declare the array?
Large, Efficent Local Memory Arrays?
Started by Jonathan, Jun 08 2009 01:36 PM
1 reply to this topic
#1
Posted 08 June 2009 - 01:36 PM
#2
Posted 10 June 2009 - 10:36 AM
Hi Jonathan,
The synthesis/mapping tools from the FPGA device vendors will construct RAM modules from the hard resources in the device for a (nearly) arbitrarily wide/deep Impulse C array. Ask for a 224 bit-wide array with 1024 elements and Xilinx ISE, for example, will stitch together block RAMs as needed to give you a 224-bit data bus.
This RAM will still be limited by having two ports: one for reading, another for reading or writing. So you can only get two elements at a time from a given Impulse C array. To work around this limitation of the FPGA hardware, you can use multiple C arrays, from each of which you can read two elements per clock cycle. You could alternately construct an array of great width, say (1024/2)*224 bits, and thereby read a two huge unsigned integers from it each clock, which you can then shift/mask to marshal into 224-bit structures, most of which could be done in parallel with the array reads. This often works, but the synthesis tools are not always able to create a datapath wide enough.
Impulse C supports integer types of arbitrary width. See co_math.h for examples of how you can define a co_uint128, for example. Handling such nonstandard C types in desktop simulation can be tricky, but the types are fully supported in hardware compilation.
Regards,
Ralph
The synthesis/mapping tools from the FPGA device vendors will construct RAM modules from the hard resources in the device for a (nearly) arbitrarily wide/deep Impulse C array. Ask for a 224 bit-wide array with 1024 elements and Xilinx ISE, for example, will stitch together block RAMs as needed to give you a 224-bit data bus.
This RAM will still be limited by having two ports: one for reading, another for reading or writing. So you can only get two elements at a time from a given Impulse C array. To work around this limitation of the FPGA hardware, you can use multiple C arrays, from each of which you can read two elements per clock cycle. You could alternately construct an array of great width, say (1024/2)*224 bits, and thereby read a two huge unsigned integers from it each clock, which you can then shift/mask to marshal into 224-bit structures, most of which could be done in parallel with the array reads. This often works, but the synthesis tools are not always able to create a datapath wide enough.
Impulse C supports integer types of arbitrary width. See co_math.h for examples of how you can define a co_uint128, for example. Handling such nonstandard C types in desktop simulation can be tricky, but the types are fully supported in hardware compilation.
Regards,
Ralph
Ralph Bodenner
Impulse Accelerated Technologies, Inc.
Impulse Accelerated Technologies, Inc.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












