Page 1 of 1

How many changes has ⎕R made?

Posted: Wed Sep 07, 2016 2:47 pm
by Budgie
Is there a nice easy way to determine how many changes have been made by a call to ⎕R?

Re: How many changes has ⎕R made?

Posted: Wed Sep 07, 2016 3:11 pm
by ArrayMac227
Looking in the help file for ⎕R (Replace) I found:

('.at' ⎕S {⍵.((1↑Offsets),1↑Lengths)}) 'The cat sat on the mat'
4 3 8 3 19 3 ⍝ 3 items

Does this help?

Re: How many changes has ⎕R made?

Posted: Thu Sep 08, 2016 9:58 am
by Phil Last
Never used it but -

You would have to do the search for the original substring before the replacement. Then you would know how many were about to be replaced. Looking for the frequency of the replacement substring after the change might find instances that were already in the source.

Of course others may know if further information regarding the number of changes is available during the actual call to ⎕R as apparently we can specify text and/or other data to be returned.

Re: How many changes has ⎕R made?

Posted: Thu Sep 08, 2016 1:22 pm
by ArrayMac227
It is worthwhile to read documentation on ⎕R and ⎕S. It is the first system function I've seen that works with data items outside the 'usual' numeric character and enclosed domains.

≢⎕←'.at'⎕S⊢'The cat sat on the mat'
#.[⎕S match info] #.[⎕S match info] #.[⎕S match info]
3

Essentially, regular expressions are not only a new sub-language all on their own, but the Dyalog interface breaks some new ground.

Re: How many changes has ⎕R made?

Posted: Fri Sep 09, 2016 7:41 am
by Morten|Dyalog
Not the first system function, the first system OPERATOR (of any description). In languages with strong support for REGEX, it appeared to us that (well, me any anyway :-)) that regular expressions were used as a control structure, invoking a block of code for each match. The closest equivalent to that in APL is an operator with a user-defined function. So rather than go for a classical ⎕SS style function, we decided to make ⎕R/⎕S the first "system operators".

Re: How many changes has ⎕R made?

Posted: Fri Sep 09, 2016 12:13 pm
by Budgie
ArrayMac227 wrote:It is worthwhile to read documentation on ⎕R and ⎕S. It is the first system function I've seen that works with data items outside the 'usual' numeric character and enclosed domains.

≢⎕←'.at'⎕S⊢'The cat sat on the mat'
#.[⎕S match info] #.[⎕S match info] #.[⎕S match info]
3

Essentially, regular expressions are not only a new sub-language all on their own, but the Dyalog interface breaks some new ground.


I have read the documentation, which is why I am asking here. In my application the variable being processed is a vector of (typically 20,000) character vectors. Can you guarantee that what is returned by ⎕S is exactly what will be processed by ⎕R in all circumstances?

Re: How many changes has ⎕R made?

Posted: Fri Sep 09, 2016 12:41 pm
by ArrayMac227
> I have read the documentation, which is why I am asking here. In my application the
> variable being processed is a vector of (typically 20,000) character vectors. Can you
> guarantee that what is returned by ⎕S is exactly what will be processed by ⎕R in all
> circumstances?[/quote]

@Budgie I'm not sure if you're asking anything beyond whether ⎕R has bugs or not? The guarantees are whatever come with the software, I presume. This is always something you can test.

Re: How many changes has ⎕R made?

Posted: Fri Sep 09, 2016 12:48 pm
by Richard|Dyalog
It is not clear to me what is meant by the number of changes that have been made. For example, both of these examples will change 'Hello' to 'HELLO', but the first will do this with one change and the second will do it with five smaller ones:

('.+' ⎕r '\u0')'Hello'
('.' ⎕r '\u0')'Hello'

If you consider this to be one change in both cases then the only way to determine this would be to do some clever analysis of the before and after text; the following assumes you want to know how many times the pattern matched the supplied text and thus how many separate changes were made.

Both functions have the form:

R ← (patterns ⎕S transformations ⍠ options) document

Both look in the document for matches to the pattern(s), and when they find them they generate either replacement text (⎕R) or a single element of the result (⎕S) using the given transformation.

With identical patterns and options, there should be exactly the same number of elements in the result of ⎕S as there are replacements to the document by ⎕R. There are some options which can only be used with either ⎕R or ⎕S, so providing identical options to each may not be possible, plus this would be a quite inefficient way of determining the number of replacements. But for a non time-critical analysis it may well be adequate.

Alternatively, the transformations supported by ⎕R are (1) a character vector containing replacement text and simple patterns which can reference the matched text, or (2) a more powerful function call out which can do anything it wants to generate replacement text. Thus you could count the number of replacements by using the function call out to generate the text, and update a count on each invocation - although if you are currently using the non-function form you would need to implement the APL function. If you went this route you would also be able to do additional analysis about the length and positions of the matches, if you wished.

Re: How many changes has ⎕R made?

Posted: Fri Sep 09, 2016 1:11 pm
by Budgie
This is supposed to be a simple application of searching and replacing text that has been generated by OCR. As you know, OCR is not perfect, and there are lots of false readings, perhaps influenced by a built-in spelling dictionary. When the language you are OCR-ing from is not the same as the spelling dictionary, and when the OCR has been done by somebody else so you don't have any control over it, you get even more trouble. What I am trying to do is correct large numbers of these errors automatically, before going through a proof-reading stage and manually correcting those that still exist. Knowing how many hits have been made for a particular replace operation gives an indication of how useful that particular change will be next time round.

Re: How many changes has ⎕R made?

Posted: Sat Sep 10, 2016 1:32 pm
by DanB|Dyalog
I don't know if this can help here but have you looked into ]locate?

]locate can perform replacements and tell how many changes were made and show you, if you wish, where the changes were made.