Page 1 of 1

Non-overlapping text search

Posted: Fri Feb 13, 2015 4:07 pm
by mvranic
The result of Find function ⍷ can be over overlapped.

Function textSearch implementes :While loop to remove overlapping.

Code: Select all

∇b←srchfor textSearch srchin;size;b1;ix;⎕ML
 ⎕ML←0
 b←srchfor⍷srchin

 size←⊃⍴,srchfor
 :While ∨/b1←(1↓ix)<size+¯1↓ix←{⍵/⍳⍴⍵}b    ⍝ Check overlapping.
    b[⊃b1/1↓ix]←0
 :EndWhile

Code: Select all

'aba'  textSearchBool'babababababa' 
0 1 0 0 0 1 0 0 0 1 0 0

Is there faster way to remove overlapping (except to use regular expressions)?

Re: Non-overlapping text search

Posted: Fri Feb 13, 2015 5:09 pm
by Phil Last
I just pulled this subfn out of my find/replace function:

      fi←{(⍴,⍺){⍵\⍺{⍵∊⊃⍺{⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵}/⌽(-⍺),⍵}⍵/⍳⍴⍵}⍺⍷⍵}

I knew it worked but I've never done any speed testing on it. Nor can I remember the sequence of steps I went through to develop it. I guess it's readonly code 'though some insight can probably be gained by breaking it inside all the braces.

      ⍝ (or perhaps not.)
fi←{
(⍴,⍺){
⍵\⍺{
⍵∊⊃⍺{
⍵,⍺/⍨⍺≥⍺⍺+⌈/⍵
}/⌽(-⍺),⍵
}⍵/⍳⍴⍵
}⍺⍷⍵
}


      )copy dfns cf
C:\Program Files\Dyalog\Dyalog APL-64 14.0 Uni...

cf runs the two fns against the same arguments multiple times up to one second and compares (divides) their times. (fi ≡ textSearch) is a fork that runs both fns and matches their results.

      'aba'(fi ≡ textSearch) 'babababababa'
1
'aba'(fi cf textSearch) 'babababababa'
0.94419134396355
'aba'(fi ≡ textSearch) 1e3 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e3 ⍴ 'babababababa'
0.66764705882353
'aba'(fi ≡ textSearch) 1e4 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e4 ⍴ 'babababababa'
0.37480559875583
'aba'(fi ≡ textSearch) 1e5 ⍴ 'babababababa'
1
'aba'(fi cf textSearch) 1e5 ⍴ 'babababababa'
0.18665105386417

Re: Non-overlapping text search

Posted: Fri Feb 13, 2015 5:15 pm
by Phil Last
p.s. the structure in the middle:
      {
...
}/⌽(-⍺),⍵

is a for loop.

Re: Non-overlapping text search

Posted: Sat Feb 14, 2015 7:03 pm
by Roger|Dyalog
A general solution to the problem requires solving a transitive closure problem. See the ^: entry in the J dictionary http://www.jsoftware.com/help/dictionary/d202n.htm and section 1.2 of the APL87 paper Some Uses of { and } http://www.jsoftware.com/papers/from.htm . The details are left as an exercise for the reader for now. :-)

Re: Non-overlapping text search

Posted: Sun Feb 15, 2015 5:59 am
by Roger|Dyalog
I forgot that I’d written a J Wiki essay on this topic: http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings. I’ll provide a translation of it into Dyalog APL. Watch this space!