Page 1 of 1

⎕AVU changes

Posted: Thu Jun 02, 2011 9:05 am
by Phil Last
Hidden away in:

Version 12.1 Release Notes
Performance Improvements
Miscellaneous
Changes to ⎕AVU

we were told:

In version 12.1, there is a small change to the default ⎕AVU as follows:

⎕AVU[⎕IO+219 237]←180 96

Previously, these two elements had the value 8217 8218. [sic]

The Unicode code points involved are:

0096: U+0060 GRAVE ACCENT
0180: U+00B4 ACUTE ACCENT
8216: U+2018 LEFT SINGLE QUOTATION MARK
8217: U+2019 RIGHT SINGLE QUOTATION MARK

-----
Equally obscurely in:

Version 13.0 Release Notes
Overview
Interoperability and Compatibilty
AVU Changes

we are told:

The implementation of the function "Right" in Version 13.0 led to the discovery that ⎕AVU incorrectly defined ⎕AV[59+⎕IO] as ¤ (⎕UCS 164) rather than ⊢ (Right Tack, ⎕UCS 8866). This error has been corrected in the default ⎕AVU and in workspace AVU.dws. If you are operating in a mixed Unicode/Classic environment, this error will have caused earlier Classic editions to map ⎕AV[59+⎕IO] to the wrong Unicode character (¤). This may cause TRANSLATION ERRORs when a Version 13.0 Classic system attempts to read the data, as it will not be able to represent ¤ in the Atomic Vector.

-----

The wording of the V13.0 change implies that a correction is being made. How can the removal of a character ("¤", the so called "currency sign") that has been in the same position in ⎕AV at least since V11.0 be described as the correction of an error?

In fact it is the very change being described that will inevitably cause the errors you predict that would not have occurred had the change not been made. Just as the earlier unnecessary change from V12.0 to V12.1 has caused me several days work on several different occasions and continues to do so as we come across data saved in 12.0 before there was ever the merest hint that any such change would ever be contemplated.

Users of Unicode Dyalog who don't have to worry about Classic will not be affected by any removal or non-removal, addition or non-addition to ⎕AV or ⎕AVU because it is redundant and they have no need to reference either of them anywhere in their code.

So who is supposed to benefit by the above changes which necessitate either file changes or code changes or both when all users of Classic are doing is to delay an inevitable change until they can switch resources to make the necessary simplification to their systems and switch to Unicode?

Presumably this is due to some misguided assumption that all APL characters should be in ⎕AV. As the new Variant operator has just been added in V13.0 using '⍠', U+2360, ⎕UCS 9056, which also is not in ⎕AV can we expect another round of unnecessary TRANSLATION ERRORs to plague us when we get to V13.1?

Someone will answer this question with the suggestion that users of Classic should not be penalised by not being able to use Right (⊢) or Variant (⍠). We couldn't before. But if we want to we can make the change to ⎕AVU and all the concomitant data and or code changes ourselves when we are ready and without forcing that on everyone else.

And please in future if you are going to make devastating changes to something that has remained the same for decades please don't hide it on the antepenultimate page of the release notes so that we are bound in to the change with incompatible data before we even know about it.

Re: ⎕AVU changes

Posted: Fri Jun 03, 2011 3:28 pm
by Morten|Dyalog
Phil,

I'm sorry to hear that the adjustments of the translate tables have caused you so much grief. You should not worry that we are planning to make further changes; the change in 13.0 was only indirectly related to the introduction of right tack in that this is what made us discover the error (as we see it now).

I'm writing a comprehensive response which I hope will explain the reasoning behind the changes, but this may take a few days as I need to try and make sure I have all the facts right.

Regards,

Morten

Re: ⎕AVU changes

Posted: Tue Jun 07, 2011 11:10 am
by Morten|Dyalog
OK, here goes.. It's a rather long story, but I can’t see how to make it any shorter. I hope that it helps explain what happened to ⎕AVU!

I can see that we should have made more of an issue of the fact that anyone operating in a mixed Classic/Unicode environment would need to be “actively aware” of the effect of changing ⎕AVU, and re-emphasized this in the 12.1 and 13.0 release notes. Since ⎕AVU is a variable that is under your control, you do not have to use the defaults supplied by Dyalog – if you have good reason to use your own settings. You could have stayed with the v12.0 defaults, if you felt that they were better. I’m not sure that this would have been a wise choice, but we should have discussed the options in more detail in our documentation.

I sincerely hope that we will not be making further changes to ⎕AVU, and I am ready to promise that we will not do so without prior debate. Your guess that we are doing this because we think all APL symbols need to be in ⎕AV is (fortunately) incorrect, as I will explain below. New symbols that we take into use, like variant, will NOT be added to ⎕AV (and thus NOT be available to Classic users).

This sad story DOES emphasise the fact that translate tables are a bad thing, and that people should look at moving to Unicode as soon as they can. Once they have moved they can forget all about translation of character data, and confusion of the kind that we are talking about here will become impossible. I'll be doing a workshop on Unicode conversions at Dyalog'11 in Boston: If you have not looked at Unicode yet, plan to bring your code (or questions) along for some "hands on" tutoring. Most applications move to Unicode much more easily than you expect. (Phil should note that although they only mention “fresh” produce in the rules, importing rotten fruit and vegetables to the USA is probably also prohibited).

For anyone not familiar with the topic: ⎕AVU allows users of different Classic character sets (“Std”, “Alt”, “Russian” – or custom character sets) to inform APL how to map Classic data to Unicode, in order to achieve a smooth transition during the inevitable phase where you will want to use both systems for a while. Unfortunately, even though the tables only have 256 elements, it has proven to be very hard to get them right from the start. The first two errors relating to quotes were spotted immediately by an early adopter and we decided to make the change immediately, as the number of users was deemed to be quite small at the time (Phil were obviously one of them, and I am sorry that some early adopters were made to suffer).

I also believed that the second change, the recognition that ⎕AV[59+⎕IO] should be right tack rather than the currency symbol was a “slam dunk”. In the default APL font that Dyalog APL has used since 2007 when Windows Vista forced us to switch to a new font (“Dyalog Symbol”) in order to have all the APL characters displayed, this character does display as right tack. I was actually pleasantly surprised by this when we did the v13.0 design, as I also believed that we had lost the right tack to the Euro symbol (€). Around the year 2000, the Euro symbol had to be inserted at font (and ⎕AV) position ⎕IO+124, because this is where it appeared in the common Windows fonts. At that time, Adrian wisely (in my opinion) decided to move right tack to the position that was previously occupied by the international currency symbol (¤). Adrian tells me that he made this change to all his fonts in the year 2000. Since the right tack had apparently been in this position for a decade, I felt that this correction to ⎕AVU was therefore “obviously” the right thing to do, even though it might cause a few problems for early adopters.

In my investigation over the past few days, I have discovered that Dyalog (Dyadic at the time) was slow to distribute Adrian’s new fonts. The Unicode translate table embedded in the product, that formed the basis for ⎕AVU, was also not updated accordingly. Since version 10.1 in 2004, we distributed the new TrueType fonts, but still did not distribute new versions of the Dyalog Alt and Dyalog Std bitmap fonts (not even in v13.0, we missed this again, my excuse is that the fonts are not really in use any longer). Since 2007, we have provided “Dyalog Symbol”, which correctly reflects the change. I can see that anyone who has continued to use the Std or Alt bitmap fonts as distributed by Dyalog might believe that ⎕AV[59+⎕IO] was still ¤.

To summarize: I think the changes were both correct and necessary, but I do agree that we should have emphasized the fact, and it would have been even better if we had not made the errors in the first place.

Phil Last wrote:So who is supposed to benefit by the above changes which necessitate either file changes or code changes or both when all users of Classic are doing is to delay an inevitable change until they can switch resources to make the necessary simplification to their systems and switch to Unicode?

The intention is that everyone who has yet to start converting applications to Unicode will benefit from Classic APL having a “correct” Unicode Translate table (⎕AVU) when the conversion begins. I believe that the only people who have been “burned” by the change are those who have been using a mixture of Classic and Unicode across more than one release since 12.0. We will do what we can to ensure that there will be no more changes.

I hope this makes you feel a little better about our reasoning, even if it won’t give you back the time that you have lost in dealing with the issue!

Best regards,
Morten

Re: ⎕AVU changes

Posted: Sat Jun 11, 2011 9:09 pm
by Phil Last
Morten wrote:I hope this makes you feel a little better about our reasoning, even if it won’t give you back the time that you have lost in dealing with the issue!


It certainly does. Thanks for an honest and comprehensive reply.

Phil