Shortcomings in the current scheme of Version Control???
Shortcomings in the current scheme of Version Control???
Of and on I have spent some time wrapping my head around Git and SALT and Version control. I can't say it has been easy. Although very powerful, Git is hard to Grok.
First of all the Philosophy: It appears that the idea is to create scripts from APL objects and then add those scripts to Git (or some other VCS) in a local repository which can be uploaded to Github. The first problem arises in granularity - A namespace is too big and a single function too small. An application containing 20000 functions could all reside in a single namespace. If one is using namespace granularity then the file gets way too big. If each function is a script then one has to manage 20000 files. There does not appear to be a way in which a bunch of functions can be put into a file.
Let me explain further: Back at SunGard we used an overlay mechanism in which a function and its metadata was 'PUT' into the overlay and a release was an ordered set of a bunch of overlays starting with a base system that was patched by these overlays. The beauty was that one could trace how a particular function developed over time or when it came into being for the very first time. An overlay was a unit of work that contained a bunch of functions that implemented some new functionality. There seems to be no way in the SALT/Git framework that you can ask the question "Show me how this function has developed over time or who has worked on this function in the past". This is so because there is no Metadata in the script files. Git has no knowledge as to what the file contains. There is also no concept of a changing group of functions added to a script file and that managed to create a release.
I would like to draw a parallel to the Relational Database/Columnar Component files. While a relational database gives us a lot of things and makes us mainstream, it just doesn't provide what a component in a component file system provides.
What am I missing? Are we losing something really vital and central to the APL way by going the SALT/Git route?
First of all the Philosophy: It appears that the idea is to create scripts from APL objects and then add those scripts to Git (or some other VCS) in a local repository which can be uploaded to Github. The first problem arises in granularity - A namespace is too big and a single function too small. An application containing 20000 functions could all reside in a single namespace. If one is using namespace granularity then the file gets way too big. If each function is a script then one has to manage 20000 files. There does not appear to be a way in which a bunch of functions can be put into a file.
Let me explain further: Back at SunGard we used an overlay mechanism in which a function and its metadata was 'PUT' into the overlay and a release was an ordered set of a bunch of overlays starting with a base system that was patched by these overlays. The beauty was that one could trace how a particular function developed over time or when it came into being for the very first time. An overlay was a unit of work that contained a bunch of functions that implemented some new functionality. There seems to be no way in the SALT/Git framework that you can ask the question "Show me how this function has developed over time or who has worked on this function in the past". This is so because there is no Metadata in the script files. Git has no knowledge as to what the file contains. There is also no concept of a changing group of functions added to a script file and that managed to create a release.
I would like to draw a parallel to the Relational Database/Columnar Component files. While a relational database gives us a lot of things and makes us mainstream, it just doesn't provide what a component in a component file system provides.
What am I missing? Are we losing something really vital and central to the APL way by going the SALT/Git route?
Re: Shortcomings in the current scheme of Version Control???
I believe there is something reasonable between one namespace with 20000 functions
and 20000 namespaces with single functions... If even we do not use OOP, namespaces
give us great tool to divide the complex system to blocks or modules and simplify
system maintenance dramatically.
There were a couple of attempt to build homemade VCS using component files.
The very bad thing about component files is that they a binary. We may access
them from APL only and we must build all VCS functionality ourselves.
I disagree that we're going "SALT/Git rote". The real revolution was not introducing SALT.
It was the ability to ]load namespace from a **text** file! And even write a namespace
in, let say, notepad.exe and then ]load it to APL. So, we are going "text_files/any_VCS_you_like". And this is **mainstream**, because of all programmes, who write their software
in any languages, use git (default standard today), mercury, bazaar and so on.
There is no "Metadata in the script files", but there is all needed metadata in hidden
directory .git, which appears in your project directory after you initialized using git.
Have a look at brief example, which I made with bzr (bazaar), but it is practically identical to git workflow (and most commands names):
Make a directory for a project and change there
Initialize bzr repository (pay attention to .bzr)
Create the first file for a project and put it under version control (bzr add)
Make yourself known to a system
Commit your great job (it will be revision 1) and go for a lunch
Come back and continue working. Add new lines to x.txt
Commit your changes as revision 2 and go home
This evening Jim decided to work longer. He introduced himself to bazaar,
instected working file and decided to change x.txt
After he changed all "line"s to "row"s, he decided to add the new file y.txt to a project
He commited revision 3 and went home
In the morning Sasha run command log and see all changes
Sasha finished with revision 2 and he wants to know detailed differences to revision 3 (current one).
Well you see the exact info of all changes:
Regards,
Sasha.
and 20000 namespaces with single functions... If even we do not use OOP, namespaces
give us great tool to divide the complex system to blocks or modules and simplify
system maintenance dramatically.
There were a couple of attempt to build homemade VCS using component files.
The very bad thing about component files is that they a binary. We may access
them from APL only and we must build all VCS functionality ourselves.
I disagree that we're going "SALT/Git rote". The real revolution was not introducing SALT.
It was the ability to ]load namespace from a **text** file! And even write a namespace
in, let say, notepad.exe and then ]load it to APL. So, we are going "text_files/any_VCS_you_like". And this is **mainstream**, because of all programmes, who write their software
in any languages, use git (default standard today), mercury, bazaar and so on.
There is no "Metadata in the script files", but there is all needed metadata in hidden
directory .git, which appears in your project directory after you initialized using git.
Have a look at brief example, which I made with bzr (bazaar), but it is practically identical to git workflow (and most commands names):
Make a directory for a project and change there
Code: Select all
BIGMAC:try sasha$ mkdir baz
BIGMAC:try sasha$ cd baz/
Initialize bzr repository (pay attention to .bzr)
Code: Select all
BIGMAC:baz sasha$ bzr init
Created a standalone tree (format: 2a)
BIGMAC:baz sasha$ ls -A
.bzr
Create the first file for a project and put it under version control (bzr add)
Code: Select all
BIGMAC:baz sasha$ cat > x.txt
the first line created
BIGMAC:baz sasha$ bzr add x.txt
adding x.txt
Make yourself known to a system
Code: Select all
BIGMAC:baz sasha$ bzr whoami "Sasha <askom@obninsk.com>"
Commit your great job (it will be revision 1) and go for a lunch
Code: Select all
BIGMAC:baz sasha$ bzr commit -m "initial version"
Committing to: /Users/sasha/try/baz/
added x.txt
Committed revision 1.
Come back and continue working. Add new lines to x.txt
Code: Select all
BIGMAC:baz sasha$ cat > x.txt
the second line is added here
and also the third one
Commit your changes as revision 2 and go home
Code: Select all
BIGMAC:baz sasha$ bzr commit -m "added two more line"
Committing to: /Users/sasha/try/baz/
modified x.txt
Committed revision 2.
This evening Jim decided to work longer. He introduced himself to bazaar,
Code: Select all
BIGMAC:baz sasha$ bzr whoami "Jim <jim@hotmail.com>"
instected working file and decided to change x.txt
Code: Select all
BIGMAC:baz sasha$ sed -i '' 's/line/row/g' x.txt
BIGMAC:baz sasha$ cat x.txt
the second row is added here
and also the third one
After he changed all "line"s to "row"s, he decided to add the new file y.txt to a project
Code: Select all
BIGMAC:baz sasha$ cat > y.txt
this is the new file in a project
BIGMAC:baz sasha$ bzr add y.txt
adding y.txt
He commited revision 3 and went home
Code: Select all
BIGMAC:baz sasha$ bzr commit -m "lines where changed to rows and y.txt was added"
Committing to: /Users/sasha/try/baz/
modified x.txt
added y.txt
Committed revision 3.
In the morning Sasha run command log and see all changes
Code: Select all
BIGMAC:baz sasha$ bzr log
------------------------------------------------------------
revno: 3
committer: Jim <jim@hotmail.com>
branch nick: baz
timestamp: Mon 2014-12-08 01:57:31 +0300
message:
lines where changed to rows and y.txt was added
------------------------------------------------------------
revno: 2
committer: Sasha <askom@obninsk.com>
branch nick: baz
timestamp: Mon 2014-12-08 01:28:06 +0300
message:
added two more line
------------------------------------------------------------
revno: 1
committer: Sasha <askom@obninsk.com>
branch nick: baz
timestamp: Mon 2014-12-08 01:26:24 +0300
message:
initial version
Sasha finished with revision 2 and he wants to know detailed differences to revision 3 (current one).
Code: Select all
BIGMAC:baz sasha$ bzr diff -r2
=== modified file 'x.txt'
--- x.txt 2014-12-07 22:28:06 +0000
+++ x.txt 2014-12-07 22:39:34 +0000
@@ -1,2 +1,2 @@
-the second line is added here
+the second row is added here
and also the third one
=== added file 'y.txt'
--- y.txt 1970-01-01 00:00:00 +0000
+++ y.txt 2014-12-07 22:55:36 +0000
@@ -0,0 +1,1 @@
+this is the new file in a project
BIGMAC:baz sasha$
Well you see the exact info of all changes:
- 1. in file x.txt line "the second line is added here" changed to "the second row is added here"
2. file y.txt has been added with the single line "this is the new file in a project”.
Regards,
Sasha.
Re: Shortcomings in the current scheme of Version Control???
Sasha:
Thanks but that is all simple stuff.
How do you know x.txt contains, say, INITGUI, (an Apl function) and how will you trace the history of development of just this function alone when this function was worked on by multiple people.
What if a programmer needs to work on 5 functions each of which are spread across 5 different files.
I think loading a namespace from text file was implemented by SALT. ⎕SE.SALT.Load '...' etc, I believe.
How do you deploy a solution to a customer and how do you update it at customer site? How do you know which version of x.txt does a customer have?
Am I missing something obvious here?
Neeraj
Thanks but that is all simple stuff.
How do you know x.txt contains, say, INITGUI, (an Apl function) and how will you trace the history of development of just this function alone when this function was worked on by multiple people.
What if a programmer needs to work on 5 functions each of which are spread across 5 different files.
I think loading a namespace from text file was implemented by SALT. ⎕SE.SALT.Load '...' etc, I believe.
How do you deploy a solution to a customer and how do you update it at customer site? How do you know which version of x.txt does a customer have?
Am I missing something obvious here?
Neeraj
- Morten|Dyalog
- Posts: 460
- Joined: Tue Sep 09, 2008 3:52 pm
Re: Shortcomings in the current scheme of Version Control???
There *is* "something missing" here... However, I think that we MUST use industry standard source code management systems if we are to stand a chance of attracting the next generation of users. If we stick to building our own binary source code management systems, it makes it one or two orders of magnitude harder to demonstrate the coolness of APL to new users.
In my mind, the thing that is missing is indeed some kind of "metadata layer" that describes your application. I think that, at the lowest level, you do need to have one file per function, if that is the granularity at which you want to be able to track changes. Personally, I find it natural to manage namespace or class scripts and have changes tracked at that level, but I recognise that this will depend on how you work - in particular if what you are starting with is a flat workspace with 20,000 functions in it ;-).
What I think we need is another set of files, also managed by GIT or Subversion, which define how to "build" the application: These will include group, or "module" descriptions. We need a system that will build the runtime code workspaces or files from the individual functions (or namespaces and classes), using the metadata. Such a system would also manage dependencies on tools like SQAPL, Conga or other utilities provided by Dyalog. I.P.Sharp provided a tool called LOGOS that worked along these lines. LOGOS was probably a bit more than we need; I think we can build a system which is simpler than that, but will give us the full benefits of GIT & friends without compromising our "way of life".
SALT is a tool which can load Unicode files into Dyalog APL. However, it only takes about one line of code to do THAT. In my mind, the important thing that SALT adds (IMHO) is that it puts a tag in your loaded code, identifying the source file, and hooks into a callback in the editor, which means that the external file is updated every time you make a change. If you combine this with an external source code management system, you get something which is already quite nice, I think. Aside: It is our intention to eventually roll the ]LOAD / ]SAVE part of SALT into the interpreter itself, but we are not quite ready yet.
We (Dyalog) will be looking at the design of a "project description and build" tool over the next few months. We will do what we can to publish initial drafts here so that everyone has an an opportunity to comment.
In my mind, the thing that is missing is indeed some kind of "metadata layer" that describes your application. I think that, at the lowest level, you do need to have one file per function, if that is the granularity at which you want to be able to track changes. Personally, I find it natural to manage namespace or class scripts and have changes tracked at that level, but I recognise that this will depend on how you work - in particular if what you are starting with is a flat workspace with 20,000 functions in it ;-).
What I think we need is another set of files, also managed by GIT or Subversion, which define how to "build" the application: These will include group, or "module" descriptions. We need a system that will build the runtime code workspaces or files from the individual functions (or namespaces and classes), using the metadata. Such a system would also manage dependencies on tools like SQAPL, Conga or other utilities provided by Dyalog. I.P.Sharp provided a tool called LOGOS that worked along these lines. LOGOS was probably a bit more than we need; I think we can build a system which is simpler than that, but will give us the full benefits of GIT & friends without compromising our "way of life".
SALT is a tool which can load Unicode files into Dyalog APL. However, it only takes about one line of code to do THAT. In my mind, the important thing that SALT adds (IMHO) is that it puts a tag in your loaded code, identifying the source file, and hooks into a callback in the editor, which means that the external file is updated every time you make a change. If you combine this with an external source code management system, you get something which is already quite nice, I think. Aside: It is our intention to eventually roll the ]LOAD / ]SAVE part of SALT into the interpreter itself, but we are not quite ready yet.
We (Dyalog) will be looking at the design of a "project description and build" tool over the next few months. We will do what we can to publish initial drafts here so that everyone has an an opportunity to comment.
Re: Shortcomings in the current scheme of Version Control???
Morten|Dyalog wrote:I think that, at the lowest level, you do need to have one file per function, if that is the granularity at which you want to be able to track changes.
Yes indeed. And if several developers each change twenty functions in a project where the function is the level of granularity there's very little chance of any kind of clash. Store large collections in scripts and it's almost inevitable. In addition to making them much harder to edit and trace. So you lose in both ways.
I'm afraid the ability to use some other editor to edit APL leaves me cold. Why would I want to when I can come out of the function editor and run my code immediately?
The meta-data required to build an application from a bunch of files - one per function - is nothing more than their qualified names.
Re: Shortcomings in the current scheme of Version Control???
20,000 programs is a large number.
Just like you (I anyway) wouldn't keep a 20,000 lines program and would rather divide it into smaller programs/functions, 20,000 programs should be regrouped into modules (namespaces).
SALT allows you to save all your programs individually and keep a local version. Like this you can keep the history of all your changes. And if your operating system records who did what you can trace it to the individual who wrote the file. Of course if your programmers follow company rules and you require that they identify themselves in the code then you can also use APL to track their work with dates, comments, etc.
Once you've reorganized your code into working modules, the command ]SNAP allows you to distribute your code into files in folders (one per namespace, recursively). It also allows you to create a program that will bring back all this lovely code at once. Since this is regular APL code you can modify that program too to better suit your needs.
Since this is all text, once in while you can commit this to your favorite CMS, branch it out, etc.
This is not a complete solution but I think this is a step forward, and in the right direction.
Just like you (I anyway) wouldn't keep a 20,000 lines program and would rather divide it into smaller programs/functions, 20,000 programs should be regrouped into modules (namespaces).
SALT allows you to save all your programs individually and keep a local version. Like this you can keep the history of all your changes. And if your operating system records who did what you can trace it to the individual who wrote the file. Of course if your programmers follow company rules and you require that they identify themselves in the code then you can also use APL to track their work with dates, comments, etc.
Once you've reorganized your code into working modules, the command ]SNAP allows you to distribute your code into files in folders (one per namespace, recursively). It also allows you to create a program that will bring back all this lovely code at once. Since this is regular APL code you can modify that program too to better suit your needs.
Since this is all text, once in while you can commit this to your favorite CMS, branch it out, etc.
This is not a complete solution but I think this is a step forward, and in the right direction.
Re: Shortcomings in the current scheme of Version Control???
The facility to convert a function or namespace to a script is lovely and desirable, especially, for sharing code and possibly attracting new people. For a single lone programmers these facilities may be adequate. However, I do not see how the current system can be used by a team of 10 or more programmers to create and maintain a significant piece of software. The effort involved in version control will be too much.
At SunGard, we had about 24000 or so functions in a Flat APL*PLUS workspace and we were able to very easily create a custom release for each one of our clients. I do not yet see how it is possible in the current system. Here are some key insights - In VB or VC# the basic entity is a script. You work in-situ in the script and the IDE takes control of where in the script you are working. It is rarely that a Script will be a single function. In APL, the function is the basic unit of work.
Git assumes that you are going to stage and commit newer version of the same name file. There is no concept of adding a different name file with newer version of the functions that existed in the previously named script file. I think this creates a huge issue as far as APLers see the world. This is my experience. I COULD be wrong.
At SunGard, we had about 24000 or so functions in a Flat APL*PLUS workspace and we were able to very easily create a custom release for each one of our clients. I do not yet see how it is possible in the current system. Here are some key insights - In VB or VC# the basic entity is a script. You work in-situ in the script and the IDE takes control of where in the script you are working. It is rarely that a Script will be a single function. In APL, the function is the basic unit of work.
Git assumes that you are going to stage and commit newer version of the same name file. There is no concept of adding a different name file with newer version of the functions that existed in the previously named script file. I think this creates a huge issue as far as APLers see the world. This is my experience. I COULD be wrong.
Re: Shortcomings in the current scheme of Version Control???
With acre we currently manage over 8000 items in FlipDB, 12500 in CAS and smaller numbers in a dozen or so other projects, some related and some not.
Every change to every array, function, operator, namespace or class a developer makes locally is available to be undone or reinstated until he or she checks in.
Every checked-in change to every item is remembered by the RDBS: when, by whom, type, size, APLVersion and upload group, and is available for reinstatement singly or by upload.
The latest database version is available to each developer with and without his or her local changes that haven't yet been checked in. Change conflicts are handled at the item level its being impossible unknowingly to overwrite a newer version uploaded since the version one is working on was downloaded.
Every change to every array, function, operator, namespace or class a developer makes locally is available to be undone or reinstated until he or she checks in.
Every checked-in change to every item is remembered by the RDBS: when, by whom, type, size, APLVersion and upload group, and is available for reinstatement singly or by upload.
The latest database version is available to each developer with and without his or her local changes that haven't yet been checked in. Change conflicts are handled at the item level its being impossible unknowingly to overwrite a newer version uploaded since the version one is working on was downloaded.
Re: Shortcomings in the current scheme of Version Control???
neeraj wrote:... However, I do not see how the current system can be used by a team of 10 or more programmers to create and maintain a significant piece of software. The effort involved in version control will be too much.
... In APL, the function is the basic unit of work.
If you keep all your code and data in separate files - which you can do with SALT - then you would achieve the same result. 10 programmers are unlikely to be working on the SAME piece of code at the same time and if they do they will encounter the same problem in any CMS - whether it is Sungard's, ACRE or SALT: a clash. How the CMS handles this varies from one to the other. In SALT you will be warned that there is a date (or version if versionning is ON) discrepancy and you will be given the choice to overwrite or not the file. If you don't overwrite it you can compare your code with the one on file and address the problem as you would with any other CMS.
Re: Shortcomings in the current scheme of Version Control???
Dan:
I would like you to show how to use the SALT framework in conjunction with Git to be able to simulate a real company with releases, multiple customers, with different customers on different releases. I, as a software vendor, should be able to create a test environment in less than 30 minutes to replicate what a customer has. For that I need to know what a customer has.
I know there are bits and pieces here but I have spent an ungodly amount of time piecing this together.
Phil:
is there more detailed documentation on ACRE than is available on APLWiki.
Thanks
Neeraj
I would like you to show how to use the SALT framework in conjunction with Git to be able to simulate a real company with releases, multiple customers, with different customers on different releases. I, as a software vendor, should be able to create a test environment in less than 30 minutes to replicate what a customer has. For that I need to know what a customer has.
I know there are bits and pieces here but I have spent an ungodly amount of time piecing this together.
Phil:
is there more detailed documentation on ACRE than is available on APLWiki.
Thanks
Neeraj