Extract parts of the string between brackets

alexeyv · Post by **alexeyv** » Thu Mar 03, 2016 10:29 pm

Hi,

I'm toying with the simple function which should extract parts of the string between brackets, i.e. HTML tag names, to the nested array of strings:

Code: Select all

      str←'hel<o><worl>d'
      DISPLAY parse_brackets str '<' '>'
┌→───────────┐
│ ┌→┐ ┌→───┐ │
│ │o│ │worl│ │
│ └─┘ └────┘ │
└∊───────────┘

My (naive) implementation of this parse_brackets is the following:

      R←parse_brackets(str br1 br2);open;close
⍝ APL2 compatibility to test with GNU APL
⎕ML←3
⍝ Indexes of the open bracket br1
open←(str=br1)/⍳⍴str
⍝ Indexes of the close bracket br2 - 1
⍝ ¯1 to exclude closing bracket
close←¯1+(str=br2)/⍳⍴str
⍝ construct a matrix with start in the first
⍝ line and lengths of extracted words in the
⍝ second line;
⍝ split it to vertical blocks;
⍝ for each pair (begin of the string; length)
⍝ drop up to begin and take the length
R←{⍵[2]↑⍵[1]↓str}¨⊂[1]open,[0.5]close-open

But I feel this implementation is rather clumsy and there must be at least several more elegant ways to do it. Any criticism and ideas on how to implement this in more APLish way(here I feel I still doing the 'functional' way of programming, i.e. transform data to the list and apply lambda function to each element of this list).

Morten|Dyalog · Post by **Morten|Dyalog** » Thu Mar 03, 2016 11:08 pm

Here are a few variations:

Code: Select all

      txt←'hel<o><worl>d'      
      {1↓¨(⍵=⊃⍵)⊂⍵}{(+\1 ¯1 0['<>'⍳⍵])/⍵}txt
 o  worl 
      {1↓¨({⍵×⌈\⍵}+\1 ¯1 0['<>'⍳⍵])⊂⍵}txt              ⍝ IBM style ⊂, requires ⎕ML←3
 o  worl
      {(¯1+⍵⍳¨'>')↑¨⍵}{1↓¨(⍵='<')⊂⍵}txt
 o  worl 
    ('<(\w+)>' ⎕S '\1')txt
 o  worl

The last is perhaps not "very APL-ish", but in this case the REGEX is pretty elegant, IMHO.

P.S. We are toying with the idea of using ⊆ to denote the APL2-style partitioned enclose in Dyalog v16.0 (with monadic ⊆ becoming "enclose if simple"). We'd like to make all the useful functionality from different migration levels available with ⎕ML=1.

Roger|Dyalog · Post by **Roger|Dyalog** » Fri Mar 04, 2016 6:09 pm

The phrase +\1 ¯1 0['<>'⍳⍵] is number 5 in my list of Sixteen APL Amuse-Bouches. It has an ancient pedigree and stars in an amusing anecdote.

I am in awe of the phrasing (and the sarcasm) "... who runs computing in Bavaria from his headquarters in Munich, and ... who runs computing all over the world from his headquarters in Holland."

alexeyv · Post by **alexeyv** » Fri Mar 04, 2016 9:05 pm

Wow, great replies! I've only studied the reply with IBM's notation, because I after I posted this question I was looking through the Mastering Dyalog APL and remembered what IBM's ⊂ function really allows to split vector to nested array.

It took me a while to understand this second answer and to actually understand the

      (1 ¯1 0)['<>'⍳⍵]

construction to mark beginning and and of words with '1' and '¯1'.

Next was interesting to see the how to fill the array between 1 and ¯1 with 1s.

But I failed to see the need of {⍵×⌈\⍵} function. For me it looks like task is already solved even without this function:

      str
hel<o><worl>d
      1↓¨(+\(1 ¯1 0)['<>'⍳str])⊂str
 o  worl

Morten|Dyalog · Post by **Morten|Dyalog** » Sat Mar 05, 2016 9:35 am

alexeyv wrote:But I failed to see the need of {⍵×⌈\⍵} function.

I can't see it either - now - I must have been moving a bit too fast and confused myself.

Dyalog Forums

Extract parts of the string between brackets

Extract parts of the string between brackets

Re: Extract parts of the string between brackets

Re: Extract parts of the string between brackets

Re: Extract parts of the string between brackets

Re: Extract parts of the string between brackets