Mutithreading

General APL language issues
Post Reply
User avatar
Budgie
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Mutithreading

Post by Budgie »

Hi

Is Dyalog thinking of using "real" threads rather than the APL-controlled threads that are available at the moment?

This might not appear to be a great improvement if your CPU has only a few cores. So, the natural followup question is: Is Dyalog thinking of producing a version of APL that will run on NVIDIA GPUs under CUDA? I should point out that the NVIDIA Tesla S1070 has 960 cores, and stuff that runs on that goes like greased lightning.
Jane
User avatar
MikeHughes
Posts: 86
Joined: Thu Nov 26, 2009 9:03 am
Location: Market Harborough, Leicestershire, UK

Re: Mutithreading

Post by MikeHughes »

Hi Jane,

I have been working on a project with a Danish client and we can get speedups of up to 50 times (depending on data size, etc etc) on some of the calculations but you are right even passing over some of the parallel computation to the GPU you can get amazing results.

We are currently using OpenCL with Dyalog.
User avatar
Budgie
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Re: Mutithreading

Post by Budgie »

MikeHughes wrote:Hi Jane,

We are currently using OpenCL with Dyalog.


Can I ask you how that works? I am interested in the possibility of speedups to my factorisation programs.
Jane
User avatar
MikeHughes
Posts: 86
Joined: Thu Nov 26, 2009 9:03 am
Location: Market Harborough, Leicestershire, UK

Re: Mutithreading

Post by MikeHughes »

Jane,

Basically we have an APL system which does a lot of counting of SNP markers (0 1 or 2) and these are continuously being added up and grouped to calculate groupings of patients with different patterns of SNPs.

We pack the data into 4 numbers per byte (possible as its only 0 1 2) and use []NA to call the OpenCL api to load the matrix of data onto the gpu. We employed a cuda expert to write some code which is in a text file which again we load to the gpu using the OpenCL api. The we light the blue touch paper so to speak and keep writing and reading the pre and post sets of indices for the patterns.

We built an APL model first and then used this to describe the problem / test and compare the gpu results. We have a switch for gpu/cpu (APL) running since for small examples the overhead of loading the gpu is significant.

The only issue we had was the cost of double copying the data (difference between APL by value and OpenCL by reference) but the speedups made it worthwhile even so.

I can show you the code if you turn up to either the Sep or Nov BAA meetings - I wont be there in Oct as I will be at the Dyalog conference.

Best regards
Michael
User avatar
Morten|Dyalog
Posts: 460
Joined: Tue Sep 09, 2008 3:52 pm

Re: Mutithreading

Post by Morten|Dyalog »

Budgie wrote:Is Dyalog thinking of using "real" threads rather than the APL-controlled threads that are available at the moment?

Version 14.0 will provide isolates; namespaces in which expressions will run on a separate REAL thread, although you can invoke the expressions using normal "dotted" expressions from your main thread. Together with futures, which allow you to create arrays that contain as yet uncomputed result items (which an isolate might be working on), we think this will make it straightforward to utilize the cores in a typical multi-core machine. You can perform structural operations on arrays containing futures, and pass them as arguments to functions, they will "block" automatically when you call a primitive function which needs the actual value.

Budgie wrote:Is Dyalog thinking of producing a version of APL that will run on NVIDIA GPUs under CUDA?

Not necessarily CUDA, we'd like to aim for technologies which will allow us to target a variety of architectures. We are funding research at a university in the USA, on an APL Compiler that will target LLVM, which can produce code for GPUs and other highly parallel architectures. It has to be a compiler, the overhead of passing data to and from the GPU on each interpreted primitive would remove any advantage that you are likely to get from the hardware. It is too early to say whether this particular avenue is likely to succeed, but even if it does not, we see the cracking of this nut as one of our most important goals for the next 5 years.

<advertisement>
Make sure you come to Dyalog'13 to speak to the people working on the implementation of these new technologies! Michael will be there, too...
</advertisement>
Post Reply