Underutilization of CPU resources

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
Post Reply
User avatar
jmosk
Posts: 69
Joined: Thu Jul 18, 2013 5:15 am

Underutilization of CPU resources

Post by jmosk »

I am running a CPU intensive application using version 13.2 and it appears that the application is running slower than it should. There is no disk I/O involved. There are no other applications running. I am running on an HP h8-1090t which is an Intel Core i7 CPU X990 at 3.47GHz. It has 12Gb of RAM and 6 cores. Only 4.4Gb of RAM is being used. My APL application is only utilizing 8% of the CPU power. 90% - 92% of the CPU time is going to the Window 7 (64 bit) System Idle Process. This just does not seem right. I was expecting the app to utilize most of the CPU power. Using the task monitor, at most I see one core utilizing perhaps 40% of its CPU usage for that core.

Can anyone give me an idea of why I am seeing such a low overall CPU utilization of a CPU bound app?

CPU Intensive App Running.jpg


Process Utilization.jpg
+←--------------------------------------------------------------→
+ Jay Moskowitz
+←--------------------------------------------------------------→
+ http://www.linkedin.com/in/jay-moskowitz-5745b83
+
PhilGray
Posts: 50
Joined: Sat Mar 13, 2010 7:55 pm

Re: Underutilization of CPU resources

Post by PhilGray »

Are there any Calls to Windows .. e.g. GUI or other Windows Calls ?
I presume that the RAM modules are suitable (e.g. fast enough) for the CPU clock.
Remember that the CPU runs on the Clock (GHz) .. and cores do not run "parallel" , but sequentially.
i.e. the "power" is in the Clock speed, the "Multi Cores" provide Task-swapping with more efficiency.
User avatar
jmosk
Posts: 69
Joined: Thu Jul 18, 2013 5:15 am

Re: Underutilization of CPU resources

Post by jmosk »

PhilGray wrote:Are there any Calls to Windows .. e.g. GUI or other Windows Calls ?
I presume that the RAM modules are suitable (e.g. fast enough) for the CPU clock.
Remember that the CPU runs on the Clock (GHz) .. and cores do not run "parallel" , but sequentially.
i.e. the "power" is in the Clock speed, the "Multi Cores" provide Task-swapping with more efficiency.


No - this is a totally compute bound function doing calculations and array manipulation. There are no external calls. It is only executing various function calls written by me. I saw a prior post on this board were someone showed a 40% CPU utilization using the same processor (although they said they have 4 cores on their i7, but an i7 has 6 cores). So I don't know why they were showing 40% when I am stuck at 8%. The RAM is definitely fast enough for the CPU as I have C programs of a similar nature that use 95% of the CPU. I tired running 4 instances of the APL interpreter all running the same application and the task manager shows each one using 8% of the CPU. So there is plenty of available CPU power to run faster since 4 instances now used a total of 32% of the CPU.

I can't explain why I am being limited to such a low utilization on a relatively fast CPU/RAM combination.

Any ideas of what can be looked at to find out what is keeping the process from running faster? There is no memory swapping going on to close things down as there is plenty of unused RAM so nothing is being cached out to disk.
+←--------------------------------------------------------------→
+ Jay Moskowitz
+←--------------------------------------------------------------→
+ http://www.linkedin.com/in/jay-moskowitz-5745b83
+
PhilGray
Posts: 50
Joined: Sat Mar 13, 2010 7:55 pm

Re: Underutilization of CPU resources

Post by PhilGray »

You might finds some useful tools here , developed by Mark Russinovich and Bryce Cogswell :

http://technet.microsoft.com/en-us/sysi ... s/bb545027

Their "Process Explorer" is far superior to the standard Windows version.
http://technet.microsoft.com/en-us/sysi ... 96653.aspx

"Process Monitor" may also be useful.
http://technet.microsoft.com/en-us/sysi ... 96645.aspx
User avatar
JoHo
Posts: 37
Joined: Sat Nov 28, 2009 12:51 pm
Location: Austria, EU

Re: Underutilization of CPU resources

Post by JoHo »

It would be interesting, to create some dumb test functions/threads to see, if it would be possilbe to generate 100% CPU load on this multi core machine using v13.2 .
User avatar
MikeHughes
Posts: 86
Joined: Thu Nov 26, 2009 9:03 am
Location: Market Harborough, Leicestershire, UK

Re: Underutilization of CPU resources

Post by MikeHughes »

My reply seems to have disappeared.

The shorter version is the Dyalog interpreter is single threaded - Dyalog threads are effectively time slicing within the single core.

You will only be able to use one of your cores unless you use a feature like Peach. There are exceptions Dyalog are experimenting with some primitives and external dls can be system threaded but generally it is all single core.
MikeBa
Posts: 27
Joined: Thu Mar 14, 2013 11:40 am

Re: Underutilization of CPU resources

Post by MikeBa »

I raised the same issue on Mar 15. See Morten's reply, it might help.
User avatar
AndyS|Dyalog
Posts: 263
Joined: Tue May 12, 2009 6:06 pm

Re: Underutilization of CPU resources

Post by AndyS|Dyalog »

To add to Mike Hughes' comment ..

The following scalar dyadic functions (+ - × * ÷ > ≥ = ≠ ≤ < ⍟ | ! ○ ∨ ∧) execute in parallel in separate system threads each running on a separate cpu or core, when the argument size exceeds a configurable limit, the parallel execution threshold. They only execute in parallel for floating point (not however for + - ×), DECF and (where appropriate) complex numbers. In all other cases the administrative overhead of parallelising these primitive functions has been found to outweigh the benefits that we can accrue.

Two I-Beams can be used to configure the point at which the above primitives are performed in parallel system threads, and the number of threads used: 1111⌶ specifies how many system threads to use (the default is the number of CPU/cores in the machine), and 1112⌶ defines the threshold at which parallel execution takes place.

The manauls are currently (as of 2013-07-22) rather thin on this information; I've raised an issue to flesh the information out !

This feature was introduced in 12.1.

MikeBa was I believe referring to http://www.dyalog.com/forum/viewtopic.php?f=30&t=532&p=2046.
Post Reply