Finding the nth occurrence of a number

For users of dfns, both novice and expert
Post Reply
michaelk
Posts: 34
Joined: Wed Feb 10, 2010 8:38 am

Finding the nth occurrence of a number

Post by michaelk »

Has anyone written an efficient Dfn to find the nth occurrence of a number in a numerical vector that examines only so much of that vector until the nth occurrence is found?
User avatar
Phil Last
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Finding the nth occurrence of a number

Post by Phil Last »

I should have thought that almost any iterative process that looks at one item at a time is going to take longer than doing a scalar equality and a plus scan but I suppose given a sufficiently long vector, some kind of probabilistic approach might help. For the Nth occurrence of number M in vector V you could take the first N items of V and check those in case they're all equal to M. Then assuming you found H hits in the first N you look at the next (N-H)×N÷H+H=0 that'll look at a different length sub-vector proportional to the remainder required and inversely proportional to the frequency of hits so far. Of course the one you want might be the last item of V in which case rather a lot more effort has been expended than was entirely necessary.
michaelk
Posts: 34
Joined: Wed Feb 10, 2010 8:38 am

Re: Finding the nth occurrence of a number

Post by michaelk »

⍳ finds the first occurrence of a given number in a vector v, and I presume that the interpreter of this expression does not scan v in its entirety if the number occurs before the last element of v. What I am seeking is an efficient extension of ⍳.
User avatar
Phil Last
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Finding the nth occurrence of a number

Post by Phil Last »

This:

nth←{
(req num)←⍺
max←⍴vec←⍵
{
try←⍵
(try≥max)∨req≤hit←+/res←num=try↑vec:req⍴res/⍳⍴res
∇⌈req×try÷hit+hit=0
}req
⍝ ⍺ req num
⍝ ⍵ vec
⍝ ← indices of first req instances of num in vec
}

gives identical results to and is generally quicker than this deliberately naive coding:

Nth←{
(req num)←⍺
req⍴{⍵/⍳⍴⍵}⍵=num
}
⎕RL←16807
z←?1000000⍴10
10 3 nth z
9 17 34 38 53 62 70 90 91 128
10 3 Nth z
9 17 34 38 53 62 70 90 91 128

mf compares results of 2 functions given the same arguments.
      10 3 nth mf Nth ?100000⍴10
1

cf compares timings of 2 functions given the same arguments.
      10 3 nth cf Nth ?100⍴10
2.41814946619217
10 3 nth cf Nth ?1000⍴10
1.48673946957878
10 3 nth cf Nth ?10000⍴10
0.223495702005731
10 3 nth cf Nth ?100000⍴10
0.032520325203252
10 3 nth cf Nth ?1000000⍴10
0

On tracing it turns out that more often than not the answer is found on the second iteration. Perhaps it's worth it after all so long as the target number is sufficiently frequent in the source.

Could be improved.
User avatar
Phil Last
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Finding the nth occurrence of a number

Post by Phil Last »

Best I can do using iota is this abomination:


Nth←{
(req num)←⍺
vec←⍵
vec⍳{vec[vec⍳num]+←num}/⍳req
}

it doesn't work for num=0 although it can be made to with a time penalty and still it's no quicker than the naive coding above. Oh yes. And an index error ensues if there are insufficient instances of the target.

'nuff
Post Reply