Jump to content


Random stalling


  • You cannot reply to this topic
8 replies to this topic

#1 Wren

    Member

  • Members
  • PipPip
  • 9 posts

Posted 13 November 2007 - 02:01 PM

Hi -

I have a problem with a stall when implementing a Impulse C project. The programming model is straightforward: one hardware process and one software process, connected by two streams, one used to transmit data from the sw to the hw, the other used to transmit data back from the hw to the sw. The streams are each set up as 32-bits wide, 100 elements deep. The sw process sends 30 4-byte floats to the hardware, then waits for a 4-byte int to be returned. The hardware process does 4 33-element dot products, then a couple of comparisons, then returns the 4-byte int. This process is repeated about 850,000 times. The project simulates perfectly. However, when I target a Xilinx Virtex-4 platform (attached to the PLB), I get "random" (as in I can't find any pattern) stalling. Sometimes the stall occurs only after a few hundred iterations, sometimes when the process is 90% complete, and everywhere in between. When I ran the debugger, the hangup occurs in the co_stream_write function at the line "while ((XIo_In32(stream->io_addr+8)&1)==0);", where the function is checking to see if the software->hardware stream is full. I'm sort of at a loss to see why the stream is full, because the software should write exactly 30 floats then block until it receives a single int, while the hardware expects to receive exactly 30 floats then outputs a single int. Any advice as to what I'm doing wrong/how to proceed?

Thanks in advance,

Thomas

#2 etrexel

    Advanced Member

  • Impulse Staff
  • PipPipPip
  • 260 posts

Posted 13 November 2007 - 07:24 PM

Hi,
I would really need to see the code to make any real guesses as to what may be happening, but just a thought on maybe something to try: Have you tried pacing (slower/faster) the writing of your data to see if it makes a difference or at least forms a pattern? And how deep is the outgoing stream that the CPU is reading from? And, of course, what version are you running?

Ed
Ed Trexel
Impulse Accelerated Technologies, Inc.

#3 Wren

    Member

  • Members
  • PipPip
  • 9 posts

Posted 14 November 2007 - 09:59 AM

Hi Ed -

Thanks for the reply. I'm using CoDeveloper 2.20. Both of the streams are 100 elements deep. I've tried changing the depth of the streams, and I don't notice any difference in the stalling characteristics. I'm running the C code on the PPC at 100MHz, its lowest speed, and I think the bus speeds are also at 100MHz. I have tried putting in some dummy for loops (something like for(int i = 0; i < 10000; i++); ) after each stream read/write operation. It seems to help a little bit, in that the program doesn't stall as early in the process. It does, however, usually stall by the time it's gotten about 75% of the way through the program.

Thanks,

Thomas

#4 etrexel

    Advanced Member

  • Impulse Staff
  • PipPipPip
  • 260 posts

Posted 14 November 2007 - 04:19 PM

QUOTE (Wren @ Nov 14 2007, 10:59 AM) <{POST_SNAPBACK}>
Thanks for the reply. I'm using CoDeveloper 2.20. Both of the streams are 100

Hi Thomas,
That's 2.20.h.3 correct?
If at all possible it would help to see the code, if you could please send the Impulse C project files (.icProj, .c's, and .h's) to support@impulsec.com I and others could look at it for anything that might be suspicious. Meanwhile, just other thoughts on what sometimes causes these things: Are you meeting timing? ISE/EDK do not flag not meeting timing as an error, only as a warning and will generate a bit stream. The 'implementation/*.par' file file show you the summary on timing constraints and whether all were met or not.
Are you making use of #pragma CO PIPELINE? if so, have you tried turning it off?
and how about #pragma CO PRIMITIVE's? Are you using many primitive functions?

Ed
Ed Trexel
Impulse Accelerated Technologies, Inc.

#5 Wren

    Member

  • Members
  • PipPip
  • 9 posts

Posted 15 November 2007 - 03:27 PM

Thanks a bunch. I'll send them along shortly. Incidentally, there was a timing problem, just as you suggested. However, it looks like the constraint that wasn't met was a clock going from one of the DCMs to the main memory (a DDR2 DIMM), and from running the debugger, it didn't look like that was where the stall was occurring. I am looking into the problem as we speak, though.

I'm not using either pipeline or primitive pragmas, just one or two unrolls. I had the problem before and after using the unrolls.

Thomas

#6 etrexel

    Advanced Member

  • Impulse Staff
  • PipPipPip
  • 260 posts

Posted 15 November 2007 - 03:35 PM

QUOTE (Wren @ Nov 15 2007, 04:27 PM) <{POST_SNAPBACK}>
Thanks a bunch. I'll send them along shortly. Incidentally, there was a timing problem, just as you suggested. However, it looks like the constraint that wasn't met was a clock going from one of the DCMs to the main memory (a DDR2 DIMM), and


Hi Thomas,
I would recommend that you get past the timing constraint not being met before anything else. If the constraint not being met is within the DDR2 realm and you are using the DDR2 for data and/or code, there's always the chance that data going in or out is being corrupted which might explain some of the original randomness.

Ed
Ed Trexel
Impulse Accelerated Technologies, Inc.

#7 etrexel

    Advanced Member

  • Impulse Staff
  • PipPipPip
  • 260 posts

Posted 15 November 2007 - 03:38 PM

BTW: What clock speed are you running the DDR2 at and do you happen to be using an ML410? If so, what revision level?

Ed
Ed Trexel
Impulse Accelerated Technologies, Inc.

#8 Wren

    Member

  • Members
  • PipPip
  • 9 posts

Posted 16 November 2007 - 09:13 AM

The bus and processor were running at 100MHz, so I'm fairly sure the DDR2 is running at 200MHz (I used their default implementation for this part). Actually, I am using an ML410. I believe it's revision B.

Thomas

#9 etrexel

    Advanced Member

  • Impulse Staff
  • PipPipPip
  • 260 posts

Posted 16 November 2007 - 11:35 AM

QUOTE (Wren @ Nov 16 2007, 10:13 AM) <{POST_SNAPBACK}>
The bus and processor were running at 100MHz, so I'm fairly sure the DDR2 is running at 200MHz (I used their default implementation for this part). Actually, I am using an ML410. I believe it's revision B.

Thomas

Just wanted to make sure that the DDR2 was running fast enough because it has a minimum clock speed, usually around 125MHz. Especially if you are running both CPUs with the DDR and DDR2 controllers, if it isn't there already, you may want to turn up effort level for at least 'par' ('map' may help too) to 'high' to help meet timing (done in the 'etc/fast_runtime.opt' file by adding "-ol high" option under "Program par"). On the last design I did on an ML410 Rev. E I had to also add area constraints for the DDR2 controller to help it meet timing, from the UCF file:
AREA_GROUP "ddr2" RANGE=SLICE_X0Y222:SLICE_X34Y0;
INST "ddr2_sdram_32mx64" AREA_GROUP = "ddr2";

Hope that helps,
Ed
Ed Trexel
Impulse Accelerated Technologies, Inc.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users