Finding the nth occurrence of a number

michaelk · Post by **michaelk** » Mon Feb 27, 2012 12:43 am

Has anyone written an efficient Dfn to find the nth occurrence of a number in a numerical vector that examines only so much of that vector until the nth occurrence is found?

Phil Last · Post by **Phil Last** » Mon Feb 27, 2012 2:52 pm

I should have thought that almost any iterative process that looks at one item at a time is going to take longer than doing a scalar equality and a plus scan but I suppose given a sufficiently long vector, some kind of probabilistic approach might help. For the Nth occurrence of number M in vector V you could take the first N items of V and check those in case they're all equal to M. Then assuming you found H hits in the first N you look at the next (N-H)×N÷H+H=0 that'll look at a different length sub-vector proportional to the remainder required and inversely proportional to the frequency of hits so far. Of course the one you want might be the last item of V in which case rather a lot more effort has been expended than was entirely necessary.

michaelk · Post by **michaelk** » Mon Feb 27, 2012 10:42 pm

⍳ finds the first occurrence of a given number in a vector v, and I presume that the interpreter of this expression does not scan v in its entirety if the number occurs before the last element of v. What I am seeking is an efficient extension of ⍳.

Phil Last · Post by **Phil Last** » Mon Feb 27, 2012 11:36 pm

This:

      ∇
 nth←{
     (req num)←⍺
     max←⍴vec←⍵
     {
         try←⍵
         (try≥max)∨req≤hit←+/res←num=try↑vec:req⍴res/⍳⍴res
         ∇⌈req×try÷hit+hit=0
     }req
⍝ ⍺ req num
⍝ ⍵ vec
⍝ ← indices of first req instances of num in vec
 }

gives identical results to and is generally quicker than this deliberately naive coding:

      ∇
 Nth←{
     (req num)←⍺
     req⍴{⍵/⍳⍴⍵}⍵=num
 }
      ⎕RL←16807
      z←?1000000⍴10
      10 3 nth z
9 17 34 38 53 62 70 90 91 128
      10 3 Nth z
9 17 34 38 53 62 70 90 91 128

mf compares results of 2 functions given the same arguments.

      10 3 nth mf Nth ?100000⍴10
1

cf compares timings of 2 functions given the same arguments.

      10 3 nth cf Nth ?100⍴10
2.41814946619217
      10 3 nth cf Nth ?1000⍴10
1.48673946957878
      10 3 nth cf Nth ?10000⍴10
0.223495702005731
      10 3 nth cf Nth ?100000⍴10
0.032520325203252
      10 3 nth cf Nth ?1000000⍴10
0

On tracing it turns out that more often than not the answer is found on the second iteration. Perhaps it's worth it after all so long as the target number is sufficiently frequent in the source.

Could be improved.

Phil Last · Post by **Phil Last** » Tue Feb 28, 2012 12:17 am

Best I can do using iota is this abomination:

      ∇
 Nth←{
     (req num)←⍺
     vec←⍵
     vec⍳{vec[vec⍳num]+←num}/⍳req
 }

it doesn't work for num=0 although it can be made to with a time penalty and still it's no quicker than the naive coding above. Oh yes. And an index error ensues if there are insufficient instances of the target.

'nuff

Dyalog Forums

Finding the nth occurrence of a number

Finding the nth occurrence of a number

Re: Finding the nth occurrence of a number

Re: Finding the nth occurrence of a number

Re: Finding the nth occurrence of a number

Re: Finding the nth occurrence of a number