[clug] Looking for Linux on CBE programming expert help

David Tulloh david at tulloh.id.au
Sat Aug 29 11:40:00 MDT 2009


Sanders King wrote:
> I have been stalking the CLUG mailing list for 18 months without ever posting or answering a post, however today I introduce myself :-) and pose a question.
>   
Hi Sanders,

Nice to see a first time poster, hopefully we'll see you again.

Replying inline.  I don't have a solution but I do have some pointers.
>
> PS3 Cell Broadband Engine - Linux Programming
>
> I am trying to do a DMA transfer mfc_get from main memory to SPE memory.  I have done a successful transfer of the a parameters structure, however when I try to transfer some data I run into trouble.
>
> In the SPE program I have declared and allocated local memory to take the data structure, like this:
>
>     float *p1 __attribute__ ((aligned (128)));
>
>     p1 = (float*) malloc(floatArrayByteCount + 128 + 128); // Note: floatArrayByteCount = 128.  // +128+128 is to allow for 128 byte boundary alignment (below)
>   
To allow for 128 byte boundary alignment you only need one +128, it 
actually only needs to be +127.
> I am using the following to make sure that my pointers start on a 128 byte boundary.  I know that there must be a better way but I don't know it:
>
>     while ((unsigned int)p1 % 128 != 0) {p1 += 1;}
>   

p1 +=1 increments the array of floats p1 by one position, that is 
sizeof(float) bytes.  You should to cast it to a byte sized array before 
incrementing.

while ((unsigned int)p1 % 128 != 0) {(char*) p1 += 1;}

This can be written without the loop as

(char*) p1 += (memory_t) p1 % 128;

I've introduced memory_t here as a data type to represent memory for the system.  The SDK should provide an appropriate type for this, it's probably called something different.  The size of this variable should be chosen with reference to the amount of memory supported and the native datatype of the processor.  Using int is often a bad idea in my book, but here doubly so.  If int is 32 bits you have limited yourself to 4 gigs of memory, if it's 16 bits you are in deep trouble.



Note that the original code above required that it was natively a 32bit system and the compiler optimized to putting all data on 32 bit boundaries.  If this wasn't the case, for example if it were a 16bit system, you would have gotten this going into an infinite loop half the time.  I wouldn't suggest relying on this as an opmitization unless it were critical path code and even then it would have to be carefully researched and commented.


> I am lining up the main memory source pointer (passed from the PPE program) to a 128 byte bondary in a similiar way.
>   
I assume that it's been allocated sufficient space in the PPE program 
for you to do this.
> Then I start the transfer with no problem, using:
>
>     mfc_get(p1, ppe_p1_start_address, floatArrayByteCount, 31, 0, 0); // ppe_p1_start_address
>   
This is an asynchronous operation so executing the next command doesn't 
mean it worked.  This is where you are dieing, one of your buffers is 
overflowing, probably the destination.

Personally I would first get rid of the floatArrayByteCount variable.  
If you have an array of floats then the variable should be it's length, 
manipulating it as bytes should be the exception not the norm and the 
byte size should be calculated on the fly using sizeof.  It's current 
form obscures what's going on.

Once that is done review all your lines that deal with these pointers.  
Ensure that they are clear, simple to understand and that you know 
exactly how the pointer is being manipulated within it's allocated space.

> However when I wait for the transfer to finish the program stalls (apparently forever) with no error:
>
>     mfc_write_tag_mask(1<<31); 
>     mfc_read_tag_status_all(); // Stalls here
>
>   

I've never done any Cell programming, but C is C (most of the time).  
Hopefully this will lead you in the right direction.


David

PS: Is there a decent reference to the Cell SDK?  I couldn't find a 
coherent reference for the macros being used here, even the IBM site 
wouldn't give me a clear statement on what the parameters expected, 
their units etc.


More information about the linux mailing list