pragma CO PIPELINE
#1
Posted 01 July 2008 - 03:32 PM
I have been trying to use the pragma CO PIPELINE in a project with multiple arrays. I have been able to obtain a successful desktop simulation, but when I implement the project in a Xilinx board, the results are erroneous. Even in Stage Master Debugger, the results were ok. Are there restrictions as far as the data type, or communication type (I used both streams and shared memory) to use this pragma?
Thank you
cls
#2
Posted 01 July 2008 - 05:49 PM
I have been trying to use the pragma CO PIPELINE in a project with multiple arrays. I have been able to obtain a successful desktop simulation, but when I implement the project in a Xilinx board, the results are erroneous. Even in Stage Master Debugger, the results were ok. Are there restrictions as far as the data type, or communication type (I used both streams and shared memory) to use this pragma?
Thank you
cls
Hi,
If the Stage Master Debugger (SMD) ran correctly, then there is a fair chance the error occurs during synthesis because SMD is essentially the same as HDL simulation. What do your results look like? Just 0's or something recognizable? I would recommend downloading the latest version of CoDevloper (v3.20.a.4 just released today) from
http://www.impulse-support.com/ReleaseFiles/
in case you are experiencing a recently fixed bug and retrying.
Ed
Impulse Accelerated Technologies, Inc.
#3
Posted 02 July 2008 - 08:25 AM
If the Stage Master Debugger (SMD) ran correctly, then there is a fair chance the error occurs during synthesis because SMD is essentially the same as HDL simulation. What do your results look like? Just 0's or something recognizable? I would recommend downloading the latest version of CoDevloper (v3.20.a.4 just released today) from
http://www.impulse-support.com/ReleaseFiles/
in case you are experiencing a recently fixed bug and retrying.
Ed
Hi,
Thank you for your reply. The results varies, depending on whether I use shared memory or streams.
I am using floating point and in the stream case I get some data, but the results are wrong. I obtain some 1's, 0's and some floats.
In the shared memory case, the data seems to be shifted in the array and some data also is missing.
Also I was wondering for another project that is nonpipeline , when I use the stage master debugger, I obtain the warning:
warning: right shift count is negative, which basically makes all my outputs 1 in the stage master debugger simulation, but in the actual implementation, the results are correct. What could be the cause?
Thank you
cls
#4
Posted 02 July 2008 - 04:20 PM
Thank you for your reply. The results varies, depending on whether I use shared memory or streams.
I am using floating point and in the stream case I get some data, but the results are wrong. I obtain some 1's, 0's and some floats.
In the shared memory case, the data seems to be shifted in the array and some data also is missing.
Also I was wondering for another project that is nonpipeline , when I use the stage master debugger, I obtain the warning:
warning: right shift count is negative, which basically makes all my outputs 1 in the stage master debugger simulation, but in the actual implementation, the results are correct. What could be the cause?
Thank you
cls
Hi,
What target are you running on? And have you tried v3.20.a.4?
In regards to the warning in SMD: The difference is essentially between how synthesis and the HDL simulator within SMD are handling a negative shift. Avoiding negative shifts can normally be done from within your Impulse C code.
Ed
Impulse Accelerated Technologies, Inc.
#5
Posted 03 July 2008 - 10:14 AM
What target are you running on? And have you tried v3.20.a.4?
In regards to the warning in SMD: The difference is essentially between how synthesis and the HDL simulator within SMD are handling a negative shift. Avoiding negative shifts can normally be done from within your Impulse C code.
Ed
Him
I am using a Virtex II Pro platform. I just tried the v3.20.a.4, but I still obtain the same error. When I tried one array multiplication, it worked fined, but when I started using multiple arrays, I obtained more errors. In one case for example,
Correct --- Hyperterminal
Output --- Output
15.200 --- 16.200
16.200 --- 17.199
17.200 --- 18.200
18.200 --- 19.200
19.200 --- 20.200
20.199 --- 21.200
21.199 --- 22.200
22.199 --- 23.200
23.199 --- 24.200
24.199 --- 0.0000
For some reason, one of the outputs is missing in this case, but with larger arrays and more operations, the results varies much more.
Also, I was trying the new ImageFilterDMA for VirtexII Pro board that uses CO PIPELINE, but I have some timing errors when trying to create the bitstream for the example, do you have any suggestions? I have been successul in implementing the previous non-pipeline ImageFilterDMA with co developer 2.10 and 3.10 in the Virtex board.
Thank you
cls
#6
Posted 03 July 2008 - 10:28 AM
I am using a Virtex II Pro platform. I just tried the v3.20.a.4, but I still obtain the same error. When I tried one array multiplication, it worked fined, but when I started using multiple arrays, I obtained more errors. In one case for example,
Correct --- Hyperterminal
Output --- Output
15.200 --- 16.200
16.200 --- 17.199
17.200 --- 18.200
18.200 --- 19.200
19.200 --- 20.200
20.199 --- 21.200
21.199 --- 22.200
22.199 --- 23.200
23.199 --- 24.200
24.199 --- 0.0000
For some reason, one of the outputs is missing in this case, but with larger arrays and more operations, the results varies much more.
Also, I was trying the new ImageFilterDMA for VirtexII Pro board that uses CO PIPELINE, but I have some timing errors when trying to create the bitstream for the example, do you have any suggestions? I have been successul in implementing the previous non-pipeline ImageFilterDMA with co developer 2.10 and 3.10 in the Virtex board.
Thank you
cls
Hi,
So is it really the first or last data being lost in that output? If the right column were shifted down one, they'd be closer.
Have you tried running without the "#pragma CO PIPELINE"? If so, are the results the same or different? If different, then there is likely a timing/data-dependency issue when using the separate arrays with their accesses being pipelined/parallelized. If possible, seeing the project would help figure out what is happening.
In regards to timing errors, it's all a matter of seeing where the errors are and going from there. Depending upon whether the error is within the core logic, using "#pragma CO SET STAGEDELAY <x>" will add enough registering between operations to bring the clock up.
Ed
Impulse Accelerated Technologies, Inc.
#7
Posted 03 July 2008 - 12:39 PM
So is it really the first or last data being lost in that output? If the right column were shifted down one, they'd be closer.
Have you tried running without the "#pragma CO PIPELINE"? If so, are the results the same or different? If different, then there is likely a timing/data-dependency issue when using the separate arrays with their accesses being pipelined/parallelized. If possible, seeing the project would help figure out what is happening.
In regards to timing errors, it's all a matter of seeing where the errors are and going from there. Depending upon whether the error is within the core logic, using "#pragma CO SET STAGEDELAY <x>" will add enough registering between operations to bring the clock up.
Ed
Hi,
Without the "#pragma CO PIPELINE", the results are correct. Below is a extract of the code:
co_memory_readblock(Memory,0,INP,10*sizeof(float));
for (k = 0; k <10 ; k++){
for (index =0; index <10; index++){
#pragma CO PIPELINE
#pragma CO SET stageDelay 32
input = INP[index];
currentval = sum[k];
testparameter = paramIn[count];
temp = currentval + (input*parameter);
sum[k] = temp;
count++;
}
}
Due to the multiple accesses to the sum array, I obtain a latency of 19, but that should not affect the result. Is there anything that is jumping out of the code that I am doing wrong?
Thank you
cls
#8
Posted 03 July 2008 - 01:11 PM
Without the "#pragma CO PIPELINE", the results are correct. Below is a extract of the code:
co_memory_readblock(Memory,0,INP,10*sizeof(float));
for (k = 0; k <10 ; k++){
for (index =0; index <10; index++){
#pragma CO PIPELINE
#pragma CO SET stageDelay 32
input = INP[index];
currentval = sum[k];
testparameter = paramIn[count];
temp = currentval + (input*parameter);
sum[k] = temp;
count++;
}
}
Due to the multiple accesses to the sum array, I obtain a latency of 19, but that should not affect the result. Is there anything that is jumping out of the code that I am doing wrong?
Thank you
cls
Hi,
Actually you are getting a rate=latency=19 because of the floating point operators being used, specifically the "hiher latency, faster clock operator" which is checked under Project->Options, "Generate" tab. Using the low-latency version results in a rate=latency=8 and there's a fair chance it is not affecting your clock frequency. Have you tried using the low-latency floating point libary with the same results?
Otherwise, without knowing the application nothing looks obviously incorrect. Only questions are:
- Are testparameter and paramIn[] really unused?
- When you see correct vs incorrect values, it is ONLY this loop that you are tunring on/off the " #pragma CO PIPELINE" correct?
- what does the code for setting sum[] look like?
Ed
Impulse Accelerated Technologies, Inc.
#9
Posted 03 July 2008 - 01:37 PM
Actually you are getting a rate=latency=19 because of the floating point operators being used, specifically the "hiher latency, faster clock operator" which is checked under Project->Options, "Generate" tab. Using the low-latency version results in a rate=latency=8 and there's a fair chance it is not affecting your clock frequency. Have you tried using the low-latency floating point libary with the same results?
Otherwise, without knowing the application nothing looks obviously incorrect. Only questions are:
- Are testparameter and paramIn[] really unused?
- When you see correct vs incorrect values, it is ONLY this loop that you are tunring on/off the " #pragma CO PIPELINE" correct?
- what does the code for setting sum[] look like?
Ed
Hi,
Yes, the use of the "higher latency" caused the latency of 19, and using the "low latency" gives a rate of 8, but the SME also show also a rate of 14 for "recursive mem sum".
I just initialize the sum[10] with zeros and in the loop fill it up with the corresponding product as result of the matrix multiplications. As I step through the code in the SMD, I actually find that the values in the matrix paramIn[100] are actually accessed in a different order that the one I expected. Actually I found a pattern: I will read the first 10 value of the paramIn[100], then skip one and read the next 10 values, and again skip one and read 10 values and so on... What can I do to prevent this from happening? Do I need to insert a delay?
The desktop simulation works fine, but in the SMD, the results for sum are actually wrong for all, but the first value, sum[0].
Also, that is the only loop that I using the CO PIPELINE pragma.
When I do not use the CO PIPELINE, all the calculations and array accesses are correct. Is there a specific way that arrays should be used when using this pragma? In addition, why can I see what the testparameter value in SMD when I do not use the pipeline pragma, but when using the pipeline pragma, testparameter remains 0.000?
Thank you
cls
#10
Posted 03 July 2008 - 02:01 PM
Yes, the use of the "higher latency" caused the latency of 19, and using the "low latency" gives a rate of 8, but the SME also show also a rate of 14 for "recursive mem sum".
I just initialize the sum[10] with zeros and in the loop fill it up with the corresponding product as result of the matrix multiplications. As I step through the code in the SMD, I actually find that the values in the matrix paramIn[100] are actually accessed in a different order that the one I expected. The desktop simulation works fine, but in the SMD, the results for sum are actually wrong for all, but the first value, sum[0].
Also, that is the only loop that I using the CO PIPELINE pragma.
When I do not use the CO PIPELINE, all the calculations and array accesses are correct. Is there a specific way that arrays should be used when using this pragma? In addition, why can I see what the testparameter value in SMD when I do not use the pipeline pragma, but when using the pipeline pragma, testparameter remains 0.000?
Thank you
cls
Hi,
What are the declarations for all the variables used as indexes then? Are they al ljust type 'int' or are they co_uint*'s? The code you gave assigns paramIn[] to 'testparameter', but testparameter is not used, only 'parameter' which is not set in the loop - is 'testparameter' really 'parameter'?
In general the use of a #pragma should not affect the operation of your code unless the #pragma tells the compiler something that isn't true such as a "#pragma CO NONRECURSIVE <array>" for an array that is indeed being used recursively. Unless the indexes are causing artifical truncations (such as limiting their size) then there may be an error in the compiler which will show up when using SMD - you are using v3.20.a.4 correct?
When using #pragma CO PIPELINE, it is much like using a high optimization level on any compiler in that variables may get lost and/or optimized out. In the case of pipelines, there may be additinal instances of the same variable for each stage of the pipeline. Be sure to add watch variables from within the scope you want to watch them.
Ed
Impulse Accelerated Technologies, Inc.
#11
Posted 03 July 2008 - 02:12 PM
What are the declarations for all the variables used as indexes then? Are they al ljust type 'int' or are they co_uint*'s? The code you gave assigns paramIn[] to 'testparameter', but testparameter is not used, only 'parameter' which is not set in the loop - is 'testparameter' really 'parameter'?
In general the use of a #pragma should not affect the operation of your code unless the #pragma tells the compiler something that isn't true such as a "#pragma CO NONRECURSIVE <array>" for an array that is indeed being used recursively. Unless the indexes are causing artifical truncations (such as limiting their size) then there may be an error in the compiler which will show up when using SMD - you are using v3.20.a.4 correct?
When using #pragma CO PIPELINE, it is much like using a high optimization level on any compiler in that variables may get lost and/or optimized out. In the case of pipelines, there may be additinal instances of the same variable for each stage of the pipeline. Be sure to add watch variables from within the scope you want to watch them.
Ed
Hi,
Actually, parameter is testparameter, it was my typo as I was trying to provide a clearer name for the variable, but in my code testparameter is in the place of parameter. The variables type used for indexing are just 'int'.
I am using v3.20.a.4.
I am just a bit confused as to why one of paramIn[] values is skipped at every pass of the outer loop. Do you have any suggestion to prevent this from happening?
Thank you
cls
#12
Posted 03 July 2008 - 02:27 PM
Actually, parameter is testparameter, it was my typo as I was trying to provide a clearer name for the variable, but in my code testparameter is in the place of parameter. The variables type used for indexing are just 'int'.
I am using v3.20.a.4.
I am just a bit confused as to why one of paramIn[] values is skipped at every pass of the outer loop. Do you have any suggestion to prevent this from happening?
Thank you
cls
Hi,
Temporary variables help "force" the compiler to take an extra step to do an assignment or operator and might help. A "skip" can have to do with how and where the loop is being broken and the transferance of the final value (remember pipeline's have stages and needs to keep data accordingly) from within a pipeline to the code afterwards. There may also still be circumstances where SMD might not show a variable's value accurately, this is where true HDL simulation becomes key.
Ed
Impulse Accelerated Technologies, Inc.
#13
Posted 03 July 2008 - 02:41 PM
Temporary variables help "force" the compiler to take an extra step to do an assignment or operator and might help. A "skip" can have to do with how and where the loop is being broken and the transferance of the final value (remember pipeline's have stages and needs to keep data accordingly) from within a pipeline to the code afterwards. There may also still be circumstances where SMD might not show a variable's value accurately, this is where true HDL simulation becomes key.
Ed
Hi,
Thank you for your responses. I am trying now to manually unroll the loop to eliminate the outer loop, but now when I try to generate hardware I find the error: Undefined primitive: f_suif_acc. The only thing I have done is divide the paramIn[100] array into 10 arrays paramIn0[10] ... paramIn9[10], and create 10 temporary sum0 ... sum9 variables. Do you know what does the "f_suif_acc" is refering to?
Thank you
cls
#14
Posted 03 July 2008 - 02:50 PM
Thank you for your responses. I am trying now to manually unroll the loop to eliminate the outer loop, but now when I try to generate hardware I find the error: Undefined primitive: f_suif_acc. The only thing I have done is divide the paramIn[100] array into 10 arrays paramIn0[10] ... paramIn9[10], and create 10 temporary sum0 ... sum9 variables. Do you know what does the "f_suif_acc" is refering to?
Thank you
cls
Hi,
Sorry, not off hand I'd really need to see the code and HDL output to figure out where it is coming from - try using Stage Master Explorer to line up the 'f_suif_acc' with the original code assuming it appears there, it may be a temporary variable created for the pipeline. What PSP are you using BTW?
Ed
Impulse Accelerated Technologies, Inc.
#15
Posted 03 July 2008 - 02:59 PM
Sorry, not off hand I'd really need to see the code and HDL output to figure out where it is coming from - try using Stage Master Explorer to line up the 'f_suif_acc' with the original code assuming it appears there, it may be a temporary variable created for the pipeline. What PSP are you using BTW?
Ed
Hi,
Just another thought if it is just the indexes having incorrect values between runs through the inner loop - intialize all indexes explicitly before going into the pipelined loop instead of relying on previous values. So for 'count' (I suspect it is probably your main problem now looking at the code more in SME):
count_outter = 0;
for (k = 0; k <10 ; k++){
count = count_outter;
count_outter = count_outter + 10;
for (index =0; index <10; index++){
#pragma CO PIPELINE
...
testparameter = paramIn[count];
...
count++;
}
}
Ed
Impulse Accelerated Technologies, Inc.
#16
Posted 03 July 2008 - 03:00 PM
Sorry, not off hand I'd really need to see the code and HDL output to figure out where it is coming from - try using Stage Master Explorer to line up the 'f_suif_acc' with the original code assuming it appears there, it may be a temporary variable created for the pipeline. What PSP are you using BTW?
Ed
Hi,
I am using the Xilinx Virtex-II Pro PLB PSP. Also, when I modified the code a bit, and unrolled the outer loop, the results look good. I am going to implement it to the board and see the results.
Thank you for all your help
Have a nice day
cls
#17
Posted 03 July 2008 - 03:16 PM
Just another thought if it is just the indexes having incorrect values between runs through the inner loop - intialize all indexes explicitly before going into the pipelined loop instead of relying on previous values. So for 'count' (I suspect it is probably your main problem now looking at the code more in SME):
count_outter = 0;
for (k = 0; k <10 ; k++){
count = count_outter;
count_outter = count_outter + 10;
for (index =0; index <10; index++){
#pragma CO PIPELINE
...
testparameter = paramIn[count];
...
count++;
}
}
Ed
Hi,
Thank you for your suggestion. I will look more into it, however, the final result also had the error, not just in the inner loop at that particular place. The interesting thing, is that once unrolled the outer loop, I could rearrange the order of each value in the sum array, for example I can calculate sum[3] before sum[0] and that seems to have fixed the previous error in one of sum values. I will implement it in the board to see whether or not that will work.
Thank you
cls
#18
Posted 03 July 2008 - 03:31 PM
I tried your suggestion, and indeed, the error is gone in SMD. It seems that the count index was changed before going into the pipelined loop. Thank you so much. However, when I implement this into the board the indexing error still occurs.
Thank you
cls
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users












