How many changes has ⎕R made?

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
Post Reply
User avatar
Budgie
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

How many changes has ⎕R made?

Post by Budgie »

Is there a nice easy way to determine how many changes have been made by a call to ⎕R?
Jane
ArrayMac227
Posts: 62
Joined: Sat Sep 12, 2015 1:40 pm

Re: How many changes has ⎕R made?

Post by ArrayMac227 »

Looking in the help file for ⎕R (Replace) I found:

('.at' ⎕S {⍵.((1↑Offsets),1↑Lengths)}) 'The cat sat on the mat'
4 3 8 3 19 3 ⍝ 3 items

Does this help?
User avatar
Phil Last
Posts: 628
Joined: Thu Jun 18, 2009 6:29 pm
Location: Wessex

Re: How many changes has ⎕R made?

Post by Phil Last »

Never used it but -

You would have to do the search for the original substring before the replacement. Then you would know how many were about to be replaced. Looking for the frequency of the replacement substring after the change might find instances that were already in the source.

Of course others may know if further information regarding the number of changes is available during the actual call to ⎕R as apparently we can specify text and/or other data to be returned.
ArrayMac227
Posts: 62
Joined: Sat Sep 12, 2015 1:40 pm

Re: How many changes has ⎕R made?

Post by ArrayMac227 »

It is worthwhile to read documentation on ⎕R and ⎕S. It is the first system function I've seen that works with data items outside the 'usual' numeric character and enclosed domains.

≢⎕←'.at'⎕S⊢'The cat sat on the mat'
#.[⎕S match info] #.[⎕S match info] #.[⎕S match info]
3

Essentially, regular expressions are not only a new sub-language all on their own, but the Dyalog interface breaks some new ground.
User avatar
Morten|Dyalog
Posts: 460
Joined: Tue Sep 09, 2008 3:52 pm

Re: How many changes has ⎕R made?

Post by Morten|Dyalog »

Not the first system function, the first system OPERATOR (of any description). In languages with strong support for REGEX, it appeared to us that (well, me any anyway :-)) that regular expressions were used as a control structure, invoking a block of code for each match. The closest equivalent to that in APL is an operator with a user-defined function. So rather than go for a classical ⎕SS style function, we decided to make ⎕R/⎕S the first "system operators".
User avatar
Budgie
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Re: How many changes has ⎕R made?

Post by Budgie »

ArrayMac227 wrote:It is worthwhile to read documentation on ⎕R and ⎕S. It is the first system function I've seen that works with data items outside the 'usual' numeric character and enclosed domains.

≢⎕←'.at'⎕S⊢'The cat sat on the mat'
#.[⎕S match info] #.[⎕S match info] #.[⎕S match info]
3

Essentially, regular expressions are not only a new sub-language all on their own, but the Dyalog interface breaks some new ground.


I have read the documentation, which is why I am asking here. In my application the variable being processed is a vector of (typically 20,000) character vectors. Can you guarantee that what is returned by ⎕S is exactly what will be processed by ⎕R in all circumstances?
Jane
ArrayMac227
Posts: 62
Joined: Sat Sep 12, 2015 1:40 pm

Re: How many changes has ⎕R made?

Post by ArrayMac227 »

> I have read the documentation, which is why I am asking here. In my application the
> variable being processed is a vector of (typically 20,000) character vectors. Can you
> guarantee that what is returned by ⎕S is exactly what will be processed by ⎕R in all
> circumstances?[/quote]

@Budgie I'm not sure if you're asking anything beyond whether ⎕R has bugs or not? The guarantees are whatever come with the software, I presume. This is always something you can test.
User avatar
Richard|Dyalog
Posts: 44
Joined: Thu Oct 02, 2008 11:11 am

Re: How many changes has ⎕R made?

Post by Richard|Dyalog »

It is not clear to me what is meant by the number of changes that have been made. For example, both of these examples will change 'Hello' to 'HELLO', but the first will do this with one change and the second will do it with five smaller ones:

('.+' ⎕r '\u0')'Hello'
('.' ⎕r '\u0')'Hello'

If you consider this to be one change in both cases then the only way to determine this would be to do some clever analysis of the before and after text; the following assumes you want to know how many times the pattern matched the supplied text and thus how many separate changes were made.

Both functions have the form:

R ← (patterns ⎕S transformations ⍠ options) document

Both look in the document for matches to the pattern(s), and when they find them they generate either replacement text (⎕R) or a single element of the result (⎕S) using the given transformation.

With identical patterns and options, there should be exactly the same number of elements in the result of ⎕S as there are replacements to the document by ⎕R. There are some options which can only be used with either ⎕R or ⎕S, so providing identical options to each may not be possible, plus this would be a quite inefficient way of determining the number of replacements. But for a non time-critical analysis it may well be adequate.

Alternatively, the transformations supported by ⎕R are (1) a character vector containing replacement text and simple patterns which can reference the matched text, or (2) a more powerful function call out which can do anything it wants to generate replacement text. Thus you could count the number of replacements by using the function call out to generate the text, and update a count on each invocation - although if you are currently using the non-function form you would need to implement the APL function. If you went this route you would also be able to do additional analysis about the length and positions of the matches, if you wished.
User avatar
Budgie
Posts: 36
Joined: Thu Nov 26, 2009 9:22 am
Location: Beckenham

Re: How many changes has ⎕R made?

Post by Budgie »

This is supposed to be a simple application of searching and replacing text that has been generated by OCR. As you know, OCR is not perfect, and there are lots of false readings, perhaps influenced by a built-in spelling dictionary. When the language you are OCR-ing from is not the same as the spelling dictionary, and when the OCR has been done by somebody else so you don't have any control over it, you get even more trouble. What I am trying to do is correct large numbers of these errors automatically, before going through a proof-reading stage and manually correcting those that still exist. Knowing how many hits have been made for a particular replace operation gives an indication of how useful that particular change will be next time round.
Jane
DanB|Dyalog

Re: How many changes has ⎕R made?

Post by DanB|Dyalog »

I don't know if this can help here but have you looked into ]locate?

]locate can perform replacements and tell how many changes were made and show you, if you wish, where the changes were made.
Post Reply