Extract parts of the string between brackets

Learning APL or new to Dyalog? Ask "silly" questions here, without fear...
Post Reply
alexeyv
Posts: 56
Joined: Tue Nov 17, 2015 4:18 pm

Extract parts of the string between brackets

Post by alexeyv »

Hi,

I'm toying with the simple function which should extract parts of the string between brackets, i.e. HTML tag names, to the nested array of strings:

Code: Select all

      str←'hel<o><worl>d'
      DISPLAY parse_brackets str '<' '>'
┌→───────────┐
│ ┌→┐ ┌→───┐ │
│ │o│ │worl│ │
│ └─┘ └────┘ │
└∊───────────┘


My (naive) implementation of this parse_brackets is the following:

      R←parse_brackets(str br1 br2);open;close
⍝ APL2 compatibility to test with GNU APL
⎕ML←3
⍝ Indexes of the open bracket br1
open←(str=br1)/⍳⍴str
⍝ Indexes of the close bracket br2 - 1
⍝ ¯1 to exclude closing bracket
close←¯1+(str=br2)/⍳⍴str
⍝ construct a matrix with start in the first
⍝ line and lengths of extracted words in the
⍝ second line;
⍝ split it to vertical blocks;
⍝ for each pair (begin of the string; length)
⍝ drop up to begin and take the length
R←{⍵[2]↑⍵[1]↓str}¨⊂[1]open,[0.5]close-open


But I feel this implementation is rather clumsy and there must be at least several more elegant ways to do it. Any criticism and ideas on how to implement this in more APLish way(here I feel I still doing the 'functional' way of programming, i.e. transform data to the list and apply lambda function to each element of this list).
User avatar
Morten|Dyalog
Posts: 460
Joined: Tue Sep 09, 2008 3:52 pm

Re: Extract parts of the string between brackets

Post by Morten|Dyalog »

Here are a few variations:

Code: Select all

      txt←'hel<o><worl>d'      
      {1↓¨(⍵=⊃⍵)⊂⍵}{(+\1 ¯1 0['<>'⍳⍵])/⍵}txt
 o  worl
      {1↓¨({⍵×⌈\⍵}+\1 ¯1 0['<>'⍳⍵])⊂⍵}txt              ⍝ IBM style ⊂, requires ⎕ML←3
 o  worl
      {(¯1+⍵⍳¨'>')↑¨⍵}{1↓¨(⍵='<')⊂⍵}txt
 o  worl
    ('<(\w+)>' ⎕S '\1')txt
 o  worl


The last is perhaps not "very APL-ish", but in this case the REGEX is pretty elegant, IMHO.

P.S. We are toying with the idea of using ⊆ to denote the APL2-style partitioned enclose in Dyalog v16.0 (with monadic ⊆ becoming "enclose if simple"). We'd like to make all the useful functionality from different migration levels available with ⎕ML=1.
Roger|Dyalog
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: Extract parts of the string between brackets

Post by Roger|Dyalog »

The phrase +\1 ¯1 0['<>'⍳⍵] is number 5 in my list of Sixteen APL Amuse-Bouches. It has an ancient pedigree and stars in an amusing anecdote.

I am in awe of the phrasing (and the sarcasm) "... who runs computing in Bavaria from his headquarters in Munich, and ... who runs computing all over the world from his headquarters in Holland."
alexeyv
Posts: 56
Joined: Tue Nov 17, 2015 4:18 pm

Re: Extract parts of the string between brackets

Post by alexeyv »

Wow, great replies! I've only studied the reply with IBM's notation, because I after I posted this question I was looking through the Mastering Dyalog APL and remembered what IBM's ⊂ function really allows to split vector to nested array.

It took me a while to understand this second answer and to actually understand the
      (1 ¯1 0)['<>'⍳⍵]
construction to mark beginning and and of words with '1' and '¯1'.

Next was interesting to see the how to fill the array between 1 and ¯1 with 1s.

But I failed to see the need of {⍵×⌈\⍵} function. For me it looks like task is already solved even without this function:
      str
hel<o><worl>d
1↓¨(+\(1 ¯1 0)['<>'⍳str])⊂str
o worl
User avatar
Morten|Dyalog
Posts: 460
Joined: Tue Sep 09, 2008 3:52 pm

Re: Extract parts of the string between brackets

Post by Morten|Dyalog »

alexeyv wrote:But I failed to see the need of {⍵×⌈\⍵} function.

I can't see it either - now - I must have been moving a bit too fast and confused myself.
Post Reply