APL FILES - can I reduce the bytes used?

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !
Post Reply
User avatar
woody
Posts: 146
Joined: Tue Dec 28, 2010 12:54 am
Location: Atlanta, Georgia USA
Contact:

APL FILES - can I reduce the bytes used?

Post by woody »

I have an APL file.
I collect stock market data, and APPEND the new array to the next component.

I am trying to save some space ... and NULL OUT earlier components ...
So I keep a rolling 10 days of stock data.

I see the []FSIZE in bytes used.
I REPLACE one of the current components (250K in that component)
with a NULL value.

But, when I check []FSIZE ... the used file space bytes do not get smaller.

I have replaced 7 of the older components .. each 250K bytes .. .with a NULL value ...
and yet, the APL File space remains the same.

Can I compress the APL FILE to gain back the disk space?

Or must I create a NEW APL FILE ...
and reconstruct the data ... (writing NULL in the older components) ...
and then RENAME the new file to the old file name ... in order to recover the bytes ?

Thanks in advance for your thoughts on this.

//W
Woodley Butler
Automatonics, Inc.
"Find your head in the APL Cloud"
http://www.APLcloud.com
User avatar
Morten|Dyalog
Posts: 460
Joined: Tue Sep 09, 2008 3:52 pm

Re: APL FILES - can I reduce the bytes used?

Post by Morten|Dyalog »

Hi Woodley! It would be incredibly inefficient if the file system always tried to reclaim space, so this requires an action on your part. You can use monadic (⎕FRESIZE TN) to reclaim unused space from a component file.

You might also want to experiment with creating your files with the 'Z' property, so that components will be compressed. This costs a little bit of extra CPU but, depending on the speed of your network, can sometimes actually speed things up due to significantly reduced network traffic:

Code: Select all

      data←(↓⎕A[?100000 3⍴26]),⍪100+0.01×?100000⍴10000
      4↑data
 ZVB  112.07
 VNV  142.62
 MFT  114.26
 OKL  195.82
      ⎕size 'data' ⍝ commas inserted manually in output for readability
8,800,040
      'c:\tmp\compressed' (⎕FCREATE⍠'Z' 1) 1
      data ⎕fappend 1
      ⎕fsize 1
1 2 1,331,056 1.844674407E19


Of course, in the above example, a lot of the compressing comes from having a relatively inefficient representation, with a nested array in which each item is an enclosed array. If you use a more compact form, the compression gain is less (this example is a bit of a "worst case", in that the data is completely random - but you still get about 25%):

Code: Select all

      data←(⎕A[?100000 3⍴26]) (100+0.01×?100000⍴10000)
      4↑¨data
 WDH  190.71 120.67 194.57 165.09
 GAW                             
 TMQ                             
 KDS                             
      ⎕size 'data'
1,100,120
      data ⎕fappend 1
      ⎕fsize 1
1 2 716,560 1.844674407E19


Finally, if your components are all exactly the same size (or you can easily pad them to ensure this), you might consider a "circular" set of components, where you keep a directory of which dates are in the file, and then always overwrite the oldest component with a new one of exactly the same size. Let me know if that doesn't make sense and I'll explain in more detail. And of course, if you enable compression, there is (probably) no way to guarantee that the components will have exactly the same size, so this strategy cannot be combined with compression.
User avatar
woody
Posts: 146
Joined: Tue Dec 28, 2010 12:54 am
Location: Atlanta, Georgia USA
Contact:

Re: APL FILES - can I reduce the bytes used?

Post by woody »

Excellent!

Sincere thanks.
Woodley Butler
Automatonics, Inc.
"Find your head in the APL Cloud"
http://www.APLcloud.com
User avatar
Richard|Dyalog
Posts: 44
Joined: Thu Oct 02, 2008 11:11 am

Re: APL FILES - can I reduce the bytes used?

Post by Richard|Dyalog »

A couple more things to note:

When you replaced your large components with small ones, the areas of the file originally occupied by the large components were "freed", and large unused gaps were left in the file. Although you did not see any immediate file size reduction, you could use ⎕FRESIZE to shuffle the file contents and remove the gaps so that you would - but if you simply left the gaps there, they would subsequently be reclaimed when adding new components and you would find that adding new components would not make it any bigger (until they were used up).

So, if you adopt your policy of clearing out old components by replacing them with with small ones when you add new ones, you should see the file increase in size only slowly over time.

Morten's suggestion of replacing a component with another of exactly the same size is tricky to get right - especially as a result of work that we've done to make the file handling more robust. However, that doesn't really matter because space eventually gets reused, as noted above. If you do use his technique of cycling through a circular set of components you would see some benefit because you would not be filling the file with small components, and the only size increases you would get over time would then be the result of (a) components actually being larger, and (b) wasted space due to fragmentation. You could occasionally ⎕FRESIZE the file to eliminate the fragmentation.
Post Reply