The ideas discussed in the paper and the workshop are not yet fully implemented in Dyalog APL, but we are working on them. So Paul is indeed correct that currently (v14.0) if x and y are char matrices then inverted table index-of (8⌶) is the fastest way to do index-of on them:
a←(' ',⎕a,⎕d)[?1000 22⍴37]
x←a[?1e6⍴≢a;]
y←a[?1.1e6⍴≢a;]
cmpx 'x⍳y' 'x{(↓⍺)⍳↓⍵}y' '(,⊂x)(8⌶)(,⊂y)'
x⍳y → 3.42E¯1 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x{(↓⍺)⍳↓⍵}y → 3.41E¯1 | -1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(,⊂x)(8⌶)(,⊂y) → 8.70E¯2 | -75% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x⍳y and x{(↓⍺)⍳↓⍵}y invoke the same code and so run at the same speed. (Timing differences of less than 5% are artifacts of the timing process and are not significant, or repeatable.) 8⌶ implements the latest thinking, including the use of the CRC instruction available in SSE4.1 (any Intel CPU after 2008). It is expected that in v14.1 the faster code would be used in ⍳, so employing the circumlocutory (,⊂x)(8⌶)(,⊂y) in place of x⍳y is not recommended.
As mentioned above, we are working on implementing the latest ideas. The following benchmarks show the speed-ups that can be expected once the implementation is complete:
x←?1e6⍴2e9
y←?1.1e6⍴2e9
cmpx 'x f y' 'x⍳y'
x f y → 5.78E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x⍳y → 1.83E¯1 | +216% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
cmpx 'y g x' 'y∊x'
y g x → 4.61E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
y∊x → 1.74E¯1 | +276% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x←?1e6⍴2e6
y←?1.1e6⍴2e6
cmpx 'x f y' 'x⍳y'
x f y → 1.64E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕
x⍳y → 7.62E¯2 | +365% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
cmpx 'y g x' 'y∊x'
y g x → 1.06E¯2 | 0% ⎕⎕⎕⎕⎕⎕
y∊x → 7.50E¯2 | +608% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x←?1e6⍴4e6
y←?1.1e6⍴4e6
cmpx 'x f y' 'x⍳y'
x f y → 4.92E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x⍳y → 1.58E¯1 | +221% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
cmpx 'y g x' 'y∊x'
y g x → 9.81E¯3 | 0% ⎕⎕⎕
y∊x → 1.50E¯1 | +1426% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
The benchmarks also show that it is tricky to do benchmarks. Your faithful implementers are sneaky guys and will exploit properties of the data to effect faster computation. In this case, small-range is exploited with the range being computed efficiently using vector instructions available in any non-ancient CPU. To give you an idea of the details involved: ∊ on 2e9 uses hashing; ∊ on small-range uses a table, a faster computation. The factor for 2e6 is smaller than the factor for 4e6 because the v14.0 implementation also uses a table for that range. The factor for 4e6 is bigger because the v14.0 implementation uses hashing where v14.1 would use a table, but without using more temporary space. I conjecture that in practice all integers arguments to the ⍳ family are small range, so that small-rangeness is a special case but not an unusual case. And on and on.
Stig's example is interesting and I am glad he brought it up. I have a suggestion regarding the handling of floats: unless you have reason to believe that ⎕ct is significant, try setting a local ⎕ct←0. As well, there is a possibility that ⍳ could incorporate the same efficient code that is already available in 8⌶. That is, there is the possibility that x xFindn y can be done using plain old x⍳y, with x⍳y being the faster. The following benchmark shows the upper bound on the amount of speed-up:
xData←{(?⍵⍴100),a[?⍵⍴≢a←'Stig' 'Paul' 'Morten' 'John' 'Jay' 'Nick' 'Fi' 'Roger'],⍪0.1×?⍵⍴200}
xConvert←{t←↓[⎕IO]⍵ ⋄ t[1+⎕IO]←↑¨t[1+⎕IO] ⋄ t}
x←xData 1e6
y←xData 1.1e6
x1←xConvert x
y1←xConvert y
cmpx 'x1(8⌶)y1' 'x xFindn y'
x1(8⌶)y1 → 2.34E¯1 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕
x xFindn y → 1.14E0 | +388% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
Of course, until the time when ⍳ handles it internally, you have to account for the time needed for the conversion:
cmpx 'x1←xConvert x⊣y1←xConvert y'
5.03E¯1
I should point out that compared to x1 (the "inverted table" format), x is extravagantly profligate in its use of space:
⎕size 'x' 'x1'
128000040 15000160
Inefficient use of space leads directly to unavoidable inefficiencies in time. I will be in touch with Stig regarding additional details on the matrices in his application (not to twist his arm to change his data, but to get details on how best to enhance ⍳ on his data :-).
Finally, the extension to ⍳ in v14.0 ("looking for rows") greatly improved it as a tool of thought, and the possibilities take time to sink in. Using ⍳ (and improving its implementation) for Stig's application is one example; the blog post A Speed-Up Story (http://www.dyalog.com/blog/2014/11/a-speed-up-story-2/) is another.