why imageDMA is so slow to do DMA?
#1
Posted 24 April 2007 - 07:50 PM
As followed code, no operation except co_memory_readblock/co_memory_writeblock use
a long time! I don't know how to resolve the problem!
Can you help me! Thank you!
do{
co_signal_wait(go, &status);
co_memory_readblock(imgmem, 0, img, DATA_NUM*2); //read from imgmen to img
co_memory_writeblock(imgmem, 0, img, DATA_NUM*2);//from img to imgmem
co_signal_post(done, 1);
}while(1);
#2
Posted 24 April 2007 - 08:31 PM
As followed code, no operation except co_memory_readblock/co_memory_writeblock use
a long time! I don't know how to resolve the problem!
Can you help me! Thank you!
do{
co_signal_wait(go, &status);
co_memory_readblock(imgmem, 0, img, DATA_NUM*2); //read from imgmen to img
co_memory_writeblock(imgmem, 0, img, DATA_NUM*2);//from img to imgmem
co_signal_post(done, 1);
}while(1);
Hi,
What are the times you are seeing and how are you measuring time of memcpy vs. the DMA? Could you please post or email to support@impulsec.com the complete Impulse C project as well as any code related to the memcpy timing?
Thanks,
Ed
Impulse Accelerated Technologies, Inc.
#3
Posted 24 April 2007 - 11:00 PM
I have email to support@impulsec.com the complete Impulse C project as well as code related to
memcpy.
#4
Posted 25 April 2007 - 09:12 AM
I have email to support@impulsec.com the complete Impulse C project as well as code related to
memcpy.
Hi,
Thank you for sending in the code. I have not run the code on my ML403 (yet) but my intial thoughts are:
1) The data cache of the CPU is turned on, this will make the memcpy run much faster because the write's are being cached and the timing will not include the time for the data to be written to SDRAM. Changing the line in main():
XCache_EnableDCache(0x80000001);
to:
XCache_EnableDCache(0x00000001);
will turn off data caching for the lower memory where the SDRAM is so the operations that the CPU and DMA are doing will be much more similar.
2) The memcpy which takes a number of bytes to copy for the third parameter, doesn't appear to be copying the same amount of data. DATA_NUM is 256 and the target array in the hardware process is of type co_int16, so the memcpy() call should be changed from:
memcpy((int*)XPAR_SDRAM_8MX32_BASEADDR,(int*)0x0,100);
to:
memcpy((int*)XPAR_SDRAM_8MX32_BASEADDR,(int*)0x0,(DATA_NUM*sizeof(co_int16)));
in order to copy the same number of bytes.
3) Lastly, please note that the co_signal_wait() call by the CPU is polling the co_signal interface to the hardware process in a fairly tight loop. The CPU also has priority on the data bus which may slow down the DMA's access through contention for the bus. Isolating the CPU's path to the co_signal interface and the DMA's path to memory will improve the throughput of the DMA. This can be done by having the SDRAM on the OPB while the CPU usees the PLB (Virtex-4 PLB PSP) or APU (Virtex-4 APU PSP) to access the co_signal because the DMA currently can only use the OPB.
Hope this helps,
Ed
Impulse Accelerated Technologies, Inc.
#5
Posted 25 April 2007 - 06:25 PM
Thank you for your reply.
1) The data cache of the CPU is turned on,I think it is useful in any application., so to turned down cache is impossible.
2) The memcpy which takes a number of bytes to copy for the third parameter. I have set them to same.
but no useful.
Can you give a suggestion wich impulseC process the data in BRAM directly not to pass data by
co_memory_readblock/co_memory_writeblock or stream_read/stream_write?
#6
Posted 25 April 2007 - 07:47 PM
I was only mentioning this for comparison reasons, I wouldn't suggest to do it for the final application, that would of course defeat the purpose of the data cache
but no useful.
What are the times you are seeing for the DMA and memcpy()? How is the system configured or could you please send your EDK project's .mhs file? There is also the possibility that the DMA is suffering from element size (co_int16) vs. bus width (32/64 bits depending upon bus), but I also suspect (from experience) that the CPU is dominating the bus while polling the signal and keeping the DMA off of it.
co_memory_readblock/co_memory_writeblock or stream_read/stream_write?
There are a couple ways, all require writing and/or modifying VHDL code to do them. The most correct way would be to tie a co_memory interface (requires a co_memory-to-BRAM wrapper, the "SharedBRAM" example's 'shared_mem.vhd' tries to show this, but might be out of date in some respects) to one port of a BRAM (would need to do connections from within EDK because it will have created the BRAM) and tie the other port to a bus via a BRAM controller in EDK. co_memory's can be accessed directly from within a hardware process using a pointer and not requiring the use of co_memory_read/writeblock().
Thanks,
Ed
Impulse Accelerated Technologies, Inc.
#7
Posted 25 April 2007 - 09:07 PM
Can impulseC add this function? I am not familiy with vhdl or verilog.
thanks
#8
Posted 25 April 2007 - 09:56 PM
Can impulseC add this function? I am not familiy with vhdl or verilog.
thanks
Hi,
This would require a fairly specific PSP to do everything automatically, however, there isn't a great demand for it at the moment (it would be nice to have though). One of the many projects I am working on may benefit from a similar shared memory arrangement, but it would require extra steps in EDK and I do not know when I would have something I could give you to use - the earliest would be next week, but I cannot make any promises at the moment. Please note that in a shared memory arrangement such as this, just like in a system with a DMA, the CPU's cache will need to be managed via flushing (to force writes) and invalidating (to force reads) in order to pass data correctly between the CPU and an external master that is reading/writing the same memory.
Typically streams are used to populate arrays and with the HW_STREAM_READ/WRITE() macros, data transfers are more like a memcpy() where the destination address doesn't change. This can be done from within Impulse C as part of the main process or as a separate process(es) that read/write global arrays.
In the meantime, it may be worthwhile to look at separating the CPU polling of the co_signal from the bus that the DMA is using using the APU interface.
Ed
Impulse Accelerated Technologies, Inc.
#9
Posted 16 May 2007 - 11:02 PM
#10
Posted 17 May 2007 - 07:24 AM
I haven't gotten as far as I'd hoped, but do have a little more time now to see what I can put together quickly - still can't promise a timeframe just yet, but I do what I can. What would be the minimum that would be useful for what you are doing?
I was last close to having a co_memory interface that uses only uses the co_memory_ptr() and a pointer (2-cycle access of single-words, no co_memory_read/writeblock() support) to directly access a 32-bit wide BRAM that is shared with the PowerPC over the PLB (slave mode). The intent was to use co_signal's to communicate between the PowerPC and hardware process that data was available for processing and data has been processed. This arrangement currently does take a few extra steps in EDK as well as a quick edit of the generated files to expose the co_memory interface and is also limited (which is to be corrected) to a single process accessing the co_memory.
Let me know if this might be useful to you,
Thanks,
Ed
Impulse Accelerated Technologies, Inc.
#11
Posted 18 May 2007 - 02:37 AM
1.cpu(powerpc) copy 4k(or so) data from sdram to BRAM, then send a signal to fpga IP core which implemented by impulseC.
2.the fpga IP core will do the algorithm and send a signal to powerpc when it complete it's task.
3.powerpc use the data in BRAM which have processed by fpag IP.
#12
Posted 20 May 2007 - 03:02 PM
1.cpu(powerpc) copy 4k(or so) data from sdram to BRAM, then send a signal to fpga IP core which implemented by impulseC.
2.the fpga IP core will do the algorithm and send a signal to powerpc when it complete it's task.
3.powerpc use the data in BRAM which have processed by fpag IP.
Hi,
I was able to put something together into a quick PSP to help avoid having to manually edit any files within EDK and have attached the PSP (not part of the formal release, however, it also won't get overwritten), instructions for install and running through the base Impulse C project (it does what you're after using signals) that is also attached. Basically, the PSP is the same as the "Xilinx Virtex-4 PLB (VHDL)" PSP except that for a co_memory it will expose a XIL_BRAM-type interface that can be directly connected to one side of a 'block_bram' in EDK and then the other end can be conencted to just about anything. Access to the BRAM via the co_memory interface is limited to just a pointer (no co_memory_read/writeblock support) from a single process - please see the beginning of the "HowTo" doc for for more info.
Hope this helps get you going,
Thanks,
Ed
Attached File(s)
-
PLB_XIL_BRAM_ImpulseC_Prj.zip (3.14K)
Number of downloads: 25 -
README_PLB_XIL_BRAM_PSP_HowTo.zip (370.98K)
Number of downloads: 23
Impulse Accelerated Technologies, Inc.
#13
Posted 20 May 2007 - 11:26 PM
i still make some test for your project.
now i have a problem fowllowed:
In your "HowTo.doc"
6) Configure the ‘plb_bram_comem_block’:
a. Check the “Support PLB Burst aand Cache Line Transfers” box:
b. Change c_baseaddr to 0xA000000
c. Change c_highaddr to 0xA0003fff (this also determines how much BRAM is created)
now, i change c_baseaddr to 0xffff0000 and Change c_highaddr to 0xffff3fff.
Accord to a. Check the “Support PLB Burst aand Cache Line Transfers” box:,
can i use XCache_EnableICache(0x00000001); XCache_EnableDCache(0x00000001);
to enable plb_bram_comem_block cacheable?
I found it can't,but i don't know why it can't cacheable.
#14
Posted 21 May 2007 - 12:10 AM
- co_memory access is limited to:
o 32-bit words ONLY
o Only the use of the pointer returned from co_memory_ptr() may be used, co_memory_readblock() and co_memory_writeblock() are NOT supported
o Currently only ONE hardware process may access the co_memory
Can you modify to support 16-bit ? I need it to support 16-bit.
it is best to support 8-bit,16-bit and 32-bit.
#15
Posted 21 May 2007 - 09:19 AM
Accord to a. Check the "Support PLB Burst aand Cache Line Transfers" box:,
can i use XCache_EnableICache(0x00000001); XCache_EnableDCache(0x00000001);
to enable plb_bram_comem_block cacheable?
I found it can't,but i don't know why it can't cacheable.
The shared memory must be uncacheable or it would require a lot of cache management in order to make sure the data written by the CPU is copied into the shared memory before the CPU sends the "start" signal to the hw process as well as make sure the CPU isn't reading "stale" data from the cache. The CPU's cache controller is only aware of changes to memory done by the CPU because the CPU goes thropugh the cache controller when it reads/writes data to memory.
Ed
Impulse Accelerated Technologies, Inc.
#16
Posted 21 May 2007 - 09:28 AM
- co_memory access is limited to:
o 32-bit words ONLY
o Only the use of the pointer returned from co_memory_ptr() may be used, co_memory_readblock() and co_memory_writeblock() are NOT supported
o Currently only ONE hardware process may access the co_memory
Can you modify to support 16-bit ? I need it to support 16-bit.
it is best to support 8-bit,16-bit and 32-bit.
This was a "quick" adaptation of something from a project that happened to use 32-bits (convenient bus width) and is very thin. Ideally (and as I had time to refine it) it would do dynamic bus sizing of widths from a single byte to the maximum bus width, but that adds some level of complexity. When I get a chance, I'll look into making a version that supports 16-bit (may have to be a fixed size).
Ed
Impulse Accelerated Technologies, Inc.
#17
Posted 21 May 2007 - 04:25 PM
I look forward to receiving it from you soon.
#18
Posted 21 May 2007 - 11:07 PM
I look forward to receiving it from you soon.
Here you go, this PSP will do 8, 16, and 32-bit accesses. To do 16-bit, just change the pointer type in the previous example to:
co_int16 *memblkPtr;
All previous notes still apply as does the "How To" doc.
Ed
Attached File(s)
-
PLB_XIL_BRAM_PSP_32_16_8bit.zip (19.74K)
Number of downloads: 16
Impulse Accelerated Technologies, Inc.
#19
Posted 22 May 2007 - 12:16 AM
co_int16* p1;
co_int16* p2;
memblkPtr = co_memory_ptr(memblk);
p1 = memblkPtr; //is ok
p2 = memblkPtr+256;// in menu project->Generate HDL
SharedMem_hw.c:60: Unexpected pointer assignment
iMake: *** [SharedBRAM.xic] Error 1
p2 = &memblkPtr[256];// in menu project->Generate HDL
Expecting a memory object: memblkPtr
iMake: *** [SharedBRAM.xhw] Error 1
can you tell me how to assignment p2 to &memblkPtr[256] just like in common c compiler?
thanks
#20
Posted 22 May 2007 - 09:48 AM
co_int16* p1;
co_int16* p2;
memblkPtr = co_memory_ptr(memblk);
p1 = memblkPtr; //is ok
p2 = memblkPtr+256;// in menu project->Generate HDL
SharedMem_hw.c:60: Unexpected pointer assignment
iMake: *** [SharedBRAM.xic] Error 1
p2 = &memblkPtr[256];// in menu project->Generate HDL
Expecting a memory object: memblkPtr
iMake: *** [SharedBRAM.xhw] Error 1
can you tell me how to assignment p2 to &memblkPtr[256] just like in common c compiler?
thanks
The co_memory interface, specifically the use of co_memory_ptr() and a pointer, is still fairly new and "special" due to the external interactions necessary. Though likely to be supported in the future more like pointers are for local arrays (see "Pointer Support in Hardware Processes" in the CoDeveloper User Gudie), currently only offsets from the base pointer returned by co_memory_ptr() are allowed. To do what you're after can be done with offsets/indexes rather than pointers:
co_int16 *memblkPtr;
co_int16 p1_Idx;
co_int16 p2_Idx;
co_int16 tmp;
memblkPtr = co_memory_ptr(memblk);
p1_Idx= 0; // p1 = memblkptr
p2_Idx= 256; // p2 = memblkptr + 256
memblkPtr[p1_Idx++] = 5; // *(p1++)=5
tmp = memblkPtr[p2_Idx]; // tmp = *p2
Ed
Impulse Accelerated Technologies, Inc.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












