Non-overlapping text search

Learning APL or new to Dyalog? Ask "silly" questions here, without fear...
Post Reply
mvranic
Posts: 2
Joined: Fri Jan 22, 2010 9:37 am

Non-overlapping text search

Post by mvranic »

The result of Find function ⍷ can be over overlapped.

Function textSearch implementes :While loop to remove overlapping.

Code: Select all

∇b←srchfor textSearch srchin;size;b1;ix;⎕ML
 ⎕ML←0
 b←srchfor⍷srchin

 size←⊃⍴,srchfor
 :While ∨/b1←(1↓ix)<size+¯1↓ix←{⍵/⍳⍴⍵}b    ⍝ Check overlapping.
    b[⊃b1/1↓ix]←0
 :EndWhile

Code: Select all

'aba'  textSearchBool'babababababa' 
0 1 0 0 0 1 0 0 0 1 0 0

Is there faster way to remove overlapping (except to use regular expressions)?
User avatar
Phil Last
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Non-overlapping text search

Post by Phil Last »

I just pulled this subfn out of my find/replace function:

      fi←{(⍴,⍺){⍵\⍺{⍵∊⊃⍺{⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵}/⌽(-⍺),⍵}⍵/⍳⍴⍵}⍺⍷⍵}

I knew it worked but I've never done any speed testing on it. Nor can I remember the sequence of steps I went through to develop it. I guess it's readonly code 'though some insight can probably be gained by breaking it inside all the braces.

      ⍝ (or perhaps not.)
fi←{
(⍴,⍺){
⍵\⍺{
⍵∊⊃⍺{
⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵
}/⌽(-⍺),⍵
}⍵/⍳⍴⍵
}⍺⍷⍵
}


      )copy dfns cf
C:\Program Files\Dyalog\Dyalog APL-64 14.0 Uni...

cf runs the two fns against the same arguments multiple times up to one second and compares (divides) their times. (fi ≡ textSearch) is a fork that runs both fns and matches their results.

      'aba'(fi ≡ textSearch) 'babababababa'
1
'aba'(fi cf textSearch) 'babababababa'
0.94419134396355
'aba'(fi ≡ textSearch) 1e3 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e3 ⍴ 'babababababa'
0.66764705882353
'aba'(fi ≡ textSearch) 1e4 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e4 ⍴ 'babababababa'
0.37480559875583
'aba'(fi ≡ textSearch) 1e5 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e5 ⍴ 'babababababa'
0.18665105386417
User avatar
Phil Last
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: Non-overlapping text search

Post by Phil Last »

p.s. the structure in the middle:
      {
...
}/⌽(-⍺),⍵

is a for loop.
Roger|Dyalog
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Non-overlapping text search

Post by Roger|Dyalog »

A general solution to the problem requires solving a transitive closure problem. See the ^: entry in the J dictionary http://www.jsoftware.com/help/dictionary/d202n.htm and section 1.2 of the APL87 paper Some Uses of { and } http://www.jsoftware.com/papers/from.htm . The details are left as an exercise for the reader for now. :-)
Roger|Dyalog
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Non-overlapping text search

Post by Roger|Dyalog »

I forgot that I’d written a J Wiki essay on this topic: http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings. I’ll provide a translation of it into Dyalog APL. Watch this space!
Post Reply