Generators and Yield-- persuasive tools from Python &c.

No need to worry about going "off topic" here
Forum rules
This forum is for general chit-chat, not necessarily APL-related. However, it's not for spam or for offensive or illegal comments.
petermsiegel
Posts: 159
Joined: Thu Nov 11, 2010 11:04 pm

Generators and Yield-- persuasive tools from Python &c.

Post by petermsiegel »

There's some very good stuff out there in Python and JavaScript lands about the value and simplicity of generators as tools for handing asynchronous activities, complex state-ful calculations, and so on, but most elegantly as an easy-to-conceptualize alternative to callbacks, futures (in the general CS definition), etc. At their simplest, generators allow you to create functions that "return" values via a keyword "Yield," while seemingly "freezing" in place just after the Yield, ready to build on existing state for the next iteration. See esp. the tutorials by David Beazly here: http://www.dabeaz.com/generators/ (I mentioned generators several years ago in the Forum, but the tools have matured and the use cases have grown.)

So before I give a bit of background, is anyone else use these generator tools in languages other than APL and/or interested in how to make them fast and robust in Dyalog APL?

BACKGROUND and RATIONALE
I developed a simple utility to demonstrate generators in Dyalog APL. It's done using tokens (⎕TPUT and ⎕TGET) roughly with the schema at the end of this post. I think it is very powerful (though, like :While, :For and such, it can break up natural parallelism into chunks that are slow to interpret).

It is perfect for separating out logic for async. calls to web servers, random num generators, etc-- and many other tasks which are logically infinite (that is, whose termination is determined by logic in the caller AFTER they've been started up, e.g. based on content)-- e.g. "get me the next 100-digit prime after the one you already gave me, and keep going when I ask you to, until I have what I need."

But the drawback to my quick test implementation is that it is somewhat slow and at least under Mac/OS a tad buggy (interpreter dies when there is a lot of token action in miscellaneous threads, esp if normally recoverable errors occur). In Python, generator examples abound that speed up execution because of the decrease in overhead from not building up state and intermediate results (and the associated memory load). This same advantage SHOULD accrue in APL. This seems like a natural and simplifying concept for APL.

Another issue: while tokens work fine for this task (thank you), using tokens for production code where they are used by separately contributed utilities is risky. What if utility a and b both use token 12123? Perhaps separate token pools could be created or arbitrary token names (in place of numbers) could be used to make their use in utilities more rugged.

Anyone else experienced with generators in other languages? What do you think of their expressiveness and simplicity? Utility in APL?

SIMPLE SCHEMA

Code: Select all

⍝  <gen> is our generator. It is called via a utility called <generator>
⍝  which executes it in a separate thread, passing to it a namespace shared
⍝  with the caller-- which allows for the generator to Yield (send or receive
⍝  messages as if returning w/o losing state), to Signal errors and return.
⍝  It can also pass its generator namespace downstream, so its offspring can
⍝  do the heavy lifting in the same or a new thread...
   ∇r←genHandle gen initial
      genHandle.Yield¨'first' 'second' (3 'complex') 'This is the last Yield'
      r←0   ⍝ We're done
      :RETURN
      ⍝ Normally, you'd do something more substantial
      generate_a_lot_of_state_that_takes_time_to_recalculate
      :While 1  ⍝ Forever...
         genHandle.Yield some_very_complicated_result_based_on_that_state
      :EndWhile
      ⍝ We needn't ever "return", but could
   ∇
 
⍝ Here's the simplest client... 
   g←gen generator 'some data'⍝ Start <gen> above as a generator with startup data
   g.Next                     ⍝ Request the generator to yield its next result
first
   g.Next
second
   ...                        ⍝ Soon, our generator is exhausted.
   g.Next                     ⍝ No more data, either check g.Done (1) or
StopIteration Signalled       ⍝ ... deal with the trappable signal
   g.Next
   ∧
⍝ Here's a looping client ...
  ∇r← ClientForGen init; g
   g←gen generator init  ⍝ Start up the generator <gen>, <g> is the generator ns
                         ⍝ (pseudo-"class") for communicating...
   :TRAP 900     ⍝ Our signal number that the Generator is done...
   :WHILE 1      ⍝ In python, there's a Signal that ends an iteration
                 ⍝ If an iterator is active, it just catches the signal and ends.
       g.Next    ⍝ Read the data from-- or g.Send 'new data' to-- the generator
   :ENDWHILE     ⍝ If <gen> is done, the signal results in an error msg...
   :ELSE
       'We are done'
   :ENDTRAP
User avatar
StefanoLanzavecchia
Posts: 113
Joined: Fri Oct 03, 2008 9:37 am

Re: Generators and Yield-- persuasive tools from Python &c.

Post by StefanoLanzavecchia »

I hope my reply doesn't sound like an self endorsement, because that's not my intention. I've been in love with the concept of generators the first time I came across them. And right now I cannot really remember when that was.
I admit that when I implemented something that works exactly like you describe (a separate thread per generator, using []T tokens to flow information) in 2002/3, I did not notice the connection and unfortunately used a completely different terminology in my implementation, less abstract, hence less recognisable. And, like you say, the first thing I had to do was to put a discipline on the usage of token by building a set of tool that would partition the possible domain (positive integers) into separate pools, each one devoted to a different usage of tokens. A colleague of mine described what I did back then in a recent APL meeting: SwedAPL 2016 Copenhagen - Synchronising Sofia. As far as I know our token management toolset is now being prepared for publication on the APL Wiki. The fact that tokens come from a shared pool instead of a private one, means that unless an adopter of external libraries, also buys into their token management library (before using tokens anywhere else in its program!), he's going to have a hellish time. And integration with another subsequent library which in turns uses tokens in a different way is simply going to be impossible.
Anyways.
All I can say is that the strategy you describe works in practice, though our implementation is not exactly pretty to look at. Proof is: the code went into production in 2004/5 and it must have generated billions of values since. Some of our customers are very demanding and the system just met their expectations.
Caveats: I am pretty sure the usage of the []T primitives turns an APL Thread into an OS thread. Which means that you can barely have a thousand of them... Which is a pity, since one of the ideas behind generators is to reduce physical concurrency while increasing perceived concurrency (think Coroutines). I do wish Dyalog threads were much more lightweight and I could abuse them more than I do now.

Another interesting point: researchers working for Microsoft pointed out that the IEnumerable interface (which sits behind generators in C#') has a natural dual: IObservable. And they built something called Reactive extensions, which is a very interesting programming model. The difference between the two models is that generators are used in a pull configuration (the client asks for a new value), whereas Observables push values, like events.
Last edited by StefanoLanzavecchia on Sun Apr 24, 2016 4:17 pm, edited 1 time in total.
paulmansour
Posts: 431
Joined: Fri Oct 03, 2008 4:14 pm

Re: Generators and Yield-- persuasive tools from Python &c.

Post by paulmansour »

Stefano,

This:

      Caveats: I am pretty sure the usage of the []T primitives turns an APL Thread into an OS thread.


caught my eye.

Would you, or perhaps someone at Dyalog, elaborate on this?
User avatar
StefanoLanzavecchia
Posts: 113
Joined: Fri Oct 03, 2008 9:37 am

Re: Generators and Yield-- persuasive tools from Python &c.

Post by StefanoLanzavecchia »

I've just tried. And my claim seems to be false, at least in 14.1. Nevertheless, I think there are conditions where the interpreter decides to spawn a real OS thread. I may try a few more experiment.
In the meantime I tried:
      { ⎕tget ⍵}&¨⍳100


This doesn't spawn more OS threads. But it still consume 100% CPU time. Which is only marginally better than spawning 100 OS threads. It's actually the first time I notice this anomalous CPU consumption. It may be a bug introduced recently.
User avatar
StefanoLanzavecchia
Posts: 113
Joined: Fri Oct 03, 2008 9:37 am

Re: Generators and Yield-- persuasive tools from Python &c.

Post by StefanoLanzavecchia »

This
      {F←⎕NS'' ⋄ _←'F' ⎕WC'Form'('Visible' 0) ⋄ ⎕dq 'F' }&¨⍳100

still does not spawn extra OS threads, nor goes into a CPU usage frenzy.
User avatar
StefanoLanzavecchia
Posts: 113
Joined: Fri Oct 03, 2008 9:37 am

Re: Generators and Yield-- persuasive tools from Python &c.

Post by StefanoLanzavecchia »

This
      { ⎕dl 20 }&¨⍳100

puts one CPU at 100% usage AND creates 100 OS threads.
User avatar
StefanoLanzavecchia
Posts: 113
Joined: Fri Oct 03, 2008 9:37 am

Re: Generators and Yield-- persuasive tools from Python &c.

Post by StefanoLanzavecchia »

So, this:
      {⎕tget ⍵}&¨⍳60

doesn't create OS threads nor consumes visible CPU time.
This:
      {⎕tget ⍵}&¨⍳61

doesn't create OS threads but it saturates a CPU core.
I repeat: if I had noticed this behaviour before, I cannot remember now. But it looks like a somewhat serious limitation to the usage of []T's.
Is it related to the fact that WaitForMultipleObjects can only wait for a maximum of 64 object handles? (Constant MAXIMUM_WAIT_OBJECTS is 64).

By the way: I tried all my experiments using the 32 bits, classic version of the Windows interpreter. Right now I have no means (nor much interest to be honest :-)) to test on different OSes.
User avatar
Morten|Dyalog
Posts: 460
Joined: Tue Sep 09, 2008 3:52 pm

Re: Generators and Yield-- persuasive tools from Python &c.

Post by Morten|Dyalog »

This *is* pretty persuasive, I think I need to add the following implementation as a standard pattern to the isolate workspace. It uses an isolate to run the generator, and naïvely uses one million plus the thread number (in the isolate process) as the token number. The benefits of using (futures and) isolates are that your code does run in a separate OS thread... And the tokens used are in a separate process so they will not interfere with your application.

It is all thread safe, see sample output below the code.

Code: Select all

:Namespace Generators
⍝  Example of using isolates to create (and use) a "Generator"

    (⎕IO ⎕ML)←1 1

    ∇ Yield new
      new ⎕TPUT TOKEN
    ∇
   
    ∇ r←Next
      r←⊃⎕TGET TOKEN
    ∇
   
    ∇ r←time
      r←,'ZI2,<:>,ZI2,<:>,ZI2,<.>,ZI4'⎕FMT 1 4⍴3↓⎕TS
    ∇

    ∇ r←Gen arg;data;i
      TOKEN←1000000+1000000|⎕TID
      ⎕RL←0
     
      Yield¨(arg,' started using token ',⍕TOKEN)'first' 'second'(3 'complex')
     
      ⍝ Normally, you'd do something more substantial
      data←?size⍴3 ⍝ generate_a_lot_of_state_that_takes_time_to_recalculate
     
      :For i :In ⍳3  ⍝ Or forever...
          Yield data+.=i⊣⎕DL?4 ⍝ some_very_complicated_result_based_on_that_state
      :EndFor
     
      Yield ⍬ ⍝ The signal that there is no more
    ∇


    ∇ Demo dummy;size;value;count;is
     ⍝ Demonstrate use of the generator
     
      size←1000000 ⍝ Set up initial parameters
     
      is←#.isolate.New ⎕THIS ⍝ Clone current namspace
      ⎕←'Local thread ',(⍕⎕TID),' is using remote thread ',⍕is.(Gen&'Hello') ⍝ Launch the generator
      count←0
         
      :Repeat
          count+←1
          ⎕←time ⎕TID count(value←is.Next)
      :Until value≡⍬
    ∇

:EndNamespace


Sample output:

Code: Select all

      )load isolate
C:\Program Files\Dyalog\Dyalog APL-64 14.1 Unicode\ws\isolate.dws saved Fri Jul 24 20:19:02 2015
      ]load c:\devt\isolate\sources\samples\Generators
#.Generators
      isolate.New ⍬ ⍝ Create and discard one isolate (see below)
#.[isolate]
      Generators.Demo&¨⍬ ⍬
Local thread 1 is using remote thread 8
Local thread 2 is using remote thread 6
 14:29:52.0978  1 1  Hello started using token 1000008
 14:29:52.0980  2 1  Hello started using token 1000006
 14:29:53.0023  1 2  first
 14:29:53.0087  2 2  first
 14:29:53.0100  1 3  second
 14:29:53.0201  2 3  second
 14:29:53.0203  1 4  3  complex 
 14:29:53.0309  2 4  3  complex 
 14:29:53.0319  1 5 333395
 14:29:54.0063  1 6 332725
 14:29:53.0511  2 5 333140
 14:29:56.0032  1 7 333880
 14:29:58.0020  1 8   
 14:29:57.0242  2 6 333734
 14:30:00.0963  2 7 333126
 14:30:04.0013  2 8   


The creation of an isolate before we start should be unnecessary but this example uncovered a race condition if two threads try to initialise the Isolate mechanism at the same time(!) Note that the remote threads are running in separate processes.
petermsiegel
Posts: 159
Joined: Thu Nov 11, 2010 11:04 pm

Re: Generators and Yield-- persuasive tools from Python &c.

Post by petermsiegel »

I really appreciate all the insights (and code) shared. As the Dyalog community works to continue to broaden APL's appeal, showing examples of how we can do asynchronous tasks easily is paramount, but won't have as much impact as we'd like unless they end up in tutorials and primers. What I really like about the Python community is that there are a lot of very basic examples of almost everything; what I really don't like about the Python community is how hard you sometimes have to work to find the complex cases. To oversimplify, the APL community often does amazing things, but there aren't many examples documented for other users out there. Present company excepted (very nice talk by the way, Stefano).

The reason I ended up with a namespace (didn't really need a class) was that I wanted to track the tasks spawned, so that my exit routine could clean up the threads, if needed. I did that mostly because I ran into system errors spawning and managing a lot of generators.

I used Stefano's method first, but also used Morton's approach of having the tokens assigned based simply on the generator thread allocated-- that does free you from having to have a more sophisticated allocator, though I think you need the namespace or class approach so both the generator and the client know the tokens used AND-- more importantly-- the generator thread, so the CLIENT can clean up the tokens and kill the generator. As far as I can see, a very natural way to write generators is to generate an infinite loop or recursion, gated by the yields. While I issue a SIGNAL to the generator when the client is through with it, (a) some clients are sloppy on indicating they are done and (b) sometimes on the MAC the SIGNAL causes the system errors above while a lot of tokens are being waited for-- a TKILL does not. So, the client simply looks up the thread of the generator and if it's still running after getting told to quit, the client does the TKILL.

thanks again for some good ideas. It would be great to see some papers/demos/primers generated (so to speak) on Generators APL Style.

P.S. I decided not to report the bugs in part because the interpreter just died a terrible death-- eating all the CPU-- without any diagnostics in part because I figured I'd try it on the new release after a fashion. My sense is that the MAC version is less robust than the MSoft version on classes, tokens, and a few other things.

Best
Last edited by petermsiegel on Mon Apr 25, 2016 5:15 am, edited 3 times in total.
petermsiegel
Posts: 159
Joined: Thu Nov 11, 2010 11:04 pm

Re: Generators and Yield-- persuasive tools from Python &c.

Post by petermsiegel »

A nice (but not too simple) intro to Reactive extensions is here: https://gist.github.com/staltz/868e7e9bc2a7b8c1f754

Of course Microsoft has tons of documents that are likely easy to read once you have mastered the material, but not before.

I also tried {⎕tget ⍵}&¨⍳61 on a Macbook Pro 2.7 GHz Intel Core i5 and it was up around 98% of a CPU.
Post Reply