First, forgive me if the following post is either ill-formed or poorly conceived, as I am new to both FPGAs and Impulse C, but...
On a Faster Technology P6 PCI FPGA Card, I am attempting to implement an algorithm operating on the external memory (DDR2) that allows an arbitrary number of concurrent reads and writes without locking. Thinking in terms of C/C++, I came up with an algorithm that uses a compare-and-swap instruction (like that found in x86 architecture - though the Load-Link/Store-Conditional of PowerPC would work as well). What I want to know is, first, if I have a number of pipelines all writing to external memory using co_memory_writeblock, first is this concurrent, secondly what happens when they try to access the same memory location (in theory reads and writes could be accessing the same location at the same time, too), and thirdly how do I implement a CAS or LL/SC operation?
Thanks!
Jonathan
Concurrency Safe External Memory Access
Started by Jonathan, May 29 2009 10:25 AM
3 replies to this topic
#1
Posted 29 May 2009 - 10:25 AM
#2
Posted 29 May 2009 - 03:39 PM
Hi Jonathan,
The co_memory_*block operations will be concurrent if each uses a different co_memory object (different as created with co_memory_create). Each such object generally is used to represent a different memory device or a separate address space. I'm not sure how the Faster card's PSP treats the DDR memory--it may be that just one device is available, in which case memory operations will share a single DMA interface with first-come, first-served access from your various pipelines. In short, the operations will likely not overlap in time, even if their addresses do.
Implementing a compare-and-swap over a co_memory interface could be done with the help of a co_semaphore. The co_semaphore API is not well-documented with examples of its usage, and is only Beta-quality, with incomplete support among PSPs. Please report any issues with co_semaphore to us and we'll try to resolve them quickly.
Regards,
Ralph
The co_memory_*block operations will be concurrent if each uses a different co_memory object (different as created with co_memory_create). Each such object generally is used to represent a different memory device or a separate address space. I'm not sure how the Faster card's PSP treats the DDR memory--it may be that just one device is available, in which case memory operations will share a single DMA interface with first-come, first-served access from your various pipelines. In short, the operations will likely not overlap in time, even if their addresses do.
Implementing a compare-and-swap over a co_memory interface could be done with the help of a co_semaphore. The co_semaphore API is not well-documented with examples of its usage, and is only Beta-quality, with incomplete support among PSPs. Please report any issues with co_semaphore to us and we'll try to resolve them quickly.
Regards,
Ralph
Ralph Bodenner
Impulse Accelerated Technologies, Inc.
Impulse Accelerated Technologies, Inc.
#3
Posted 01 June 2009 - 08:13 AM
Thanks for the answer, but may I ask how the semaphore implemented? I don't need an exact answer, I just want to be able to determine things such as that it doesn't read or write to a memory block, which would be incredibly slow... Can I assume that waiting on and releasing a semaphore will take much less than a full clock cycle? If so, then yes, this definitely will allow me to do CAS. Also, I'd like to know how it's implemented because I want to know if it's more of a clue to the compiler, or if it translated directly into HDL. If it's a clue, then it's the best I can do, I have to use it to implement CAS, but if it's just a macro for VHDL, then I should be able to use whatever constructs are used to implement the semaphore and make it's "increment" and "decrement" operations atomic to more efficiently do CAS... Thanks.
-Jonathan
PS: It's a bit ironic to me as a software developer that to do non-synchronized style concurrency, I need to use a synchronizing object after all :)
-Jonathan
PS: It's a bit ironic to me as a software developer that to do non-synchronized style concurrency, I need to use a synchronizing object after all :)
QUOTE (RalphBodenner @ May 29 2009, 07:39 PM) <{POST_SNAPBACK}>
Hi Jonathan,
The co_memory_*block operations will be concurrent if each uses a different co_memory object (different as created with co_memory_create). Each such object generally is used to represent a different memory device or a separate address space. I'm not sure how the Faster card's PSP treats the DDR memory--it may be that just one device is available, in which case memory operations will share a single DMA interface with first-come, first-served access from your various pipelines. In short, the operations will likely not overlap in time, even if their addresses do.
Implementing a compare-and-swap over a co_memory interface could be done with the help of a co_semaphore. The co_semaphore API is not well-documented with examples of its usage, and is only Beta-quality, with incomplete support among PSPs. Please report any issues with co_semaphore to us and we'll try to resolve them quickly.
Regards,
Ralph
The co_memory_*block operations will be concurrent if each uses a different co_memory object (different as created with co_memory_create). Each such object generally is used to represent a different memory device or a separate address space. I'm not sure how the Faster card's PSP treats the DDR memory--it may be that just one device is available, in which case memory operations will share a single DMA interface with first-come, first-served access from your various pipelines. In short, the operations will likely not overlap in time, even if their addresses do.
Implementing a compare-and-swap over a co_memory interface could be done with the help of a co_semaphore. The co_semaphore API is not well-documented with examples of its usage, and is only Beta-quality, with incomplete support among PSPs. Please report any issues with co_semaphore to us and we'll try to resolve them quickly.
Regards,
Ralph
#4
Posted 01 June 2009 - 09:17 AM
Somebody's gotta do the locking somewhere 
You can see the HDL implementation of co_semaphore in the sema.vhd library file, found in %IMPULSEC_HOME%\Architectures\VHDL\Generic\lib. It uses registers internally, not a memory block.
Ralph
You can see the HDL implementation of co_semaphore in the sema.vhd library file, found in %IMPULSEC_HOME%\Architectures\VHDL\Generic\lib. It uses registers internally, not a memory block.
Ralph
Ralph Bodenner
Impulse Accelerated Technologies, Inc.
Impulse Accelerated Technologies, Inc.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












