Iceberg model and inner workings

Details about the inner working of Iceberg to understand how it commits, merges and handles conflicts.

Basic Iceberg Model

Iceberg contains multiple IceRepository Object subclass: #IceRepository instanceVariableNames: 'name workingCopy index commitsInPackageCache' classVariableNames: 'Registry RepositoryClass' package: 'Iceberg-Core' . They are stored inside a registry.

icebergRepository := IceRepository  registry 
  detect: [ :each | each name = 'gtoolkit' ].
  

A repository has a IceWorkingCopy Object subclass: #IceWorkingCopy instanceVariableNames: 'repository packages referenceCommit shouldIgnoreNotifications project properties' classVariableNames: '' package: 'Iceberg-Core' that manage the status of all code loaded from the repository. It keeps track of what is loaded into the image (packages, comits, etc).

workingCopy := icebergRepository workingCopy
  

The working copy maintains list of IcePackage Object subclass: #IcePackage instanceVariableNames: 'package repository isDirty' classVariableNames: '' package: 'Iceberg-Core' from the repo.

workingCopy packages
  

A working copy can also be in several states. The state is computed using IceWorkingCopy>>#workingCopyState workingCopyState "The working copy can be in different states depending on the repository and the package. It is the working copy state reponsibility to decide wether we can commit, if we are on a merge, and so on... The working copy state can be obtained through the message #workingCopyState. workingCopy workingCopyState. The working copy state is calculated every time that it is called. This is because the state of the repository can be modified from outside the system (e.g., the command line or another tool). In any case, calculating the working copy state is fast enough to be executed on-line even for big repositories such as Pharo's. The working copy state is calculated from the status of each of its packages. It was decided like this because it may happen that somebody downloads a package from different commits. If this situation changes in the future, this is a good point for simplification." "This method obtains the head commit once and sends it as argument as an optimization. This is because asking for the head commit is expensive. Check the commits of #packageState" referenceCommit isCollection ifTrue: [ ^ IceInMergeWorkingCopy repository: repository ]. referenceCommit isUnknownCommit ifTrue: [ ^ IceUnknownVersionWorkingCopy repository: repository ]. referenceCommit isNoCommit ifTrue: [ ^ IceEmptyWorkingCopy repository: repository ]. ^ IceAttachedSingleVersionWorkingCopy repository: repository .

workingCopy workingCopyState
  

States are subclasses of IceWorkingCopyState Object subclass: #IceWorkingCopyState instanceVariableNames: 'repository' classVariableNames: '' package: 'Iceberg-WorkingCopy'

IceInMergeWorkingCopy IceWorkingCopyState subclass: #IceInMergeWorkingCopy instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-WorkingCopy' : Indicates that a merge is in progress

IceUnknownVersionWorkingCopy IceWorkingCopyState subclass: #IceUnknownVersionWorkingCopy instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-WorkingCopy' : indicates that the reference commit is an unknown commit.

IceEmptyWorkingCopy IceWorkingCopyState subclass: #IceEmptyWorkingCopy instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-WorkingCopy' : indicates that the referece commit is a IceNoCommit IceCommitish subclass: #IceNoCommit instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-Core'

IceAttachedSingleVersionWorkingCopy IceWorkingCopyState subclass: #IceAttachedSingleVersionWorkingCopy instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-WorkingCopy' : indicates that the reference commit is a valid commit.

Overall a repository can be in several states:

- Unknown commit: The head commit of the repository is IceUnknownCommit IceCommitish subclass: #IceUnknownCommit instanceVariableNames: 'id datetime' classVariableNames: '' package: 'Iceberg-Core' . This can be a repository created as a placeholder. A fetch could be required to load the actual repository

- Detached Working Copy: The head commit in the repository is not the same as the head commit inside the image

- Detached HEAD: The head of the repisitory is a commit instead of a branch. When cloning a repository, if the latest version is cloned, head is set to a branch. If a particular commit is cloned, then the head points to that commit, and Iceberg considers the repository detached.

- No Project Found: Missing setting for configuring the location of the code and its format

-Not loaded: No code loaded

- Uncommited changes with outgoing or incomming commits.

Creating diffs

Changes are committed bases on IceDiff Object subclass: #IceDiff instanceVariableNames: 'tree source target writerClass mergedTree' classVariableNames: '' package: 'Iceberg-Changes' . A diff contains the changes between two Iceberg commitish. This can be a diff between two commits, or a diff between the current working copy and a commit.

workingCopy diffToReferenceCommit
  

For the rest of the demo we define the source and target commitish:

sourceCommitish := workingCopy.
targetCommitish := workingCopy referenceCommit 
  
sourceCommitish := workingCopy.
targetCommitish := workingCopy referenceCommit ancestors first
  
sourceCommitish := workingCopy referenceCommit.
targetCommitish := workingCopy referenceCommit ancestors first
  
sourceCommitish diffTo: targetCommitish
  

To compute a diff: (from the class comment):

- Asking to the repository the list of changed files/packages between the two versions. These are obtained, for example, by the Monticello dirty flags and the list of modified files provided by Git.

- The first step in computing a diff is to detect the type of changes between the source and the destination. This is a high level change, subclassing IceChange Object subclass: #IceChange instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-Changes' , indicating the type of entity that changed (package), and where the changed occured (git or image):

- IceImageChange IceChange subclass: #IceImageChange instanceVariableNames: 'package' classVariableNames: '' package: 'Iceberg-Changes' : a change coming from the image (in contrast to a change coming from git)

- IceGitChange IceChange subclass: #IceGitChange instanceVariableNames: 'filePathString' classVariableNames: '' package: 'Iceberg-Libgit-Changes' : a change coming from git (in contrast to a change coming from the image)

- IceProjectChange IceChange subclass: #IceProjectChange instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-Project' : the fact that the project changed

- IceCypressPropertiesChange IceChange subclass: #IceCypressPropertiesChange instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-Changes'

changes := sourceCommitish changesTo: targetCommitish
  

- Based on these high-level changes, the diff calculates two trees of IceDefinition Object subclass: #IceDefinition instanceVariableNames: 'name' classVariableNames: '' package: 'Iceberg-Changes' . Those trees are represented as compositions of IceNode IceAbstractNode subclass: #IceNode instanceVariableNames: 'parent childrenDictionary value' classVariableNames: '' package: 'Iceberg-Changes' . These definitions are the logical entities at the level of the code model.

Diff between the working copy and a commit

When commiting a diff is made betwen the working copy and the reference commit. The same mechanism could be used to get a diff between the working copy and another commit.

Diff between two commits

One type of diff is between two commits. In this case Iceberg performs the diff at the file level.

Every file that changes is modeled as a IceGitChange IceChange subclass: #IceGitChange instanceVariableNames: 'filePathString' classVariableNames: '' package: 'Iceberg-Libgit-Changes' . The importer IceChangeImporter Object subclass: #IceChangeImporter instanceVariableNames: 'parentNode diff version' classVariableNames: '' package: 'Iceberg-Changes' has a dedicated #selector visitGitChange: anIceGitChange | importer | importer := IceGitChangeImporter new path: anIceGitChange path; diff: diff; version: version; yourself. importer importOn: parentNode. that uses a dedicated IceGitChangeImporter Object subclass: #IceGitChangeImporter instanceVariableNames: 'path diff version' classVariableNames: '' package: 'Iceberg-Libgit-Changes' to create nodes with the appropriate definitions.

The IceGitChangeImporter Object subclass: #IceGitChangeImporter instanceVariableNames: 'path diff version' classVariableNames: '' package: 'Iceberg-Libgit-Changes' creates IceDirectoryDefinition IceFileSystemDefinition subclass: #IceDirectoryDefinition instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-Changes' and IceFileDefinition IceFileSystemDefinition subclass: #IceFileDefinition instanceVariableNames: 'contents' classVariableNames: '' package: 'Iceberg-Changes' for all normal files, until it encounters a older that contains a package. In that case it creates a MCSnapshot Object subclass: #MCSnapshot instanceVariableNames: 'definitions classDefinitionCache' classVariableNames: '' package: 'Monticello-Base' from the version of that package inside the commit and uses a IceMCPackageImporter Object subclass: #IceMCPackageImporter instanceVariableNames: 'version package' classVariableNames: '' package: 'Iceberg-Changes' to create Iceberg definitions for the content of that package. These definitions are created using IceMCDefinitionImporter Object subclass: #IceMCDefinitionImporter instanceVariableNames: 'packageNode snapshot' classVariableNames: '' package: 'Iceberg-Changes' .

To determine the list of IceGitChange IceChange subclass: #IceGitChange instanceVariableNames: 'filePathString' classVariableNames: '' package: 'Iceberg-Libgit-Changes' between two version Iceberg gets the tree of files in each commit and the diff between them using LibGit.

icebergRepository  
	changedFilesBetween: sourceCommitish and: targetCommitish.
  
fromTree := (LGitCommit 
	of: icebergRepository repositoryHandle 
	fromHexString: sourceCommitish id) tree.
  
toTree := (LGitCommit 
	of: icebergRepository repositoryHandle 
	fromHexString: targetCommitish id) tree.
  
gitTreeDiff := fromTree diffTo: toTree.
  
gitTreeDiff files collect: [ :each | IceGitChange on: each ]
  

Steps for performing the diff

diff := IceDiff new
	sourceVersion: sourceCommitish;
	targetVersion: targetCommitish;
	yourself
  
leftTree := IceNode value: IceRootDefinition new.
changes do: [ :aChange | 
	aChange accept: (IceChangeImporter new
		version: sourceCommitish;
		diff: diff;
		parentNode: leftTree;
		yourself) ].
leftTree
  
rightTree := IceNode value: IceRootDefinition new.
changes do: [ :change | 
	change accept: (IceChangeImporter new
		version: targetCommitish;
		diff: diff;
		parentNode: rightTree;
		yourself) ].
rightTree
  

- Then, the two trees are diff'd (IceDiff>>#diff:with: diff: leftTree with: rightTree ^ (self mergedTreeOf: leftTree with: rightTree) select: [ :operation | operation hasChanges ] ), and a tree of differences is obtained. This tree is also a composition of IceNode IceAbstractNode subclass: #IceNode instanceVariableNames: 'parent childrenDictionary value' classVariableNames: '' package: 'Iceberg-Changes' s, but contains IceOperation Object subclass: #IceOperation instanceVariableNames: 'definition' classVariableNames: '' package: 'Iceberg-Changes' objects instead (additions, deletions and modifications).

mergedTree := diff mergedTreeOf: leftTree with: rightTree.
  
tree := mergedTree select: [ :operation | operation hasChanges ].
  

Commiting changes

The main entry point for performing a commit is IceWorkingCopy>>#commitChanges:withMessage:force: commitChanges: aDiff withMessage: message force: forcing "Creates a commit with the given changes using the comment given as argument. The forcing parameter allows to create an empty commit. This is used by the merge. NOTICE that commits can only be done if the following is true: - HEAD is a branch - the working copy reference commit is the same commit as #headCommit" | newCommit | self validateCanCommit. self repository index updateDiskWorkingCopy: aDiff; updateIndex: aDiff. (forcing not and: [repository index isEmpty]) ifTrue: [ IceNothingToCommit signal ]. newCommit := self repository commitIndexWithMessage: message andParents: (self workingCopyState referenceCommits reject: [ :each | each isNoCommit ]). ^ newCommit . This:

- writes changes to the in-image git index. Code is written to the index only when comitting, not when the user is typing the code or saving the image.

- performs a commit with the changes in the index

This both writes the actual code changes to the on-disk git index and goes the commit

fullDiff := IceDiff 
	from: sourceCommitish
	to: targetCommitish
  

To perform a commit code changes are written by Iceberg directly to the in-image git index. In the image this is an instance of IceGitIndex IceIndex subclass: #IceGitIndex instanceVariableNames: 'modifiedFilePaths' classVariableNames: '' package: 'Iceberg-Libgit-Core' . This maintains a list of modified file paths and can write them to the git index

icebergRepository index
  

IceGitIndex>>#updateDiskWorkingCopy: updateDiskWorkingCopy: anIceDiff anIceDiff tree accept: (IceGitWorkingCopyUpdateVisitor new repository: repository; index: self; diff: anIceDiff) uses a IceGitWorkingCopyUpdateVisitor IceTreeVisitor subclass: #IceGitWorkingCopyUpdateVisitor instanceVariableNames: 'repository diff index' classVariableNames: '' package: 'Iceberg-Libgit-Commit' to write the code changes to disk, without changing the in-image index.

icebergRepository index updateDiskWorkingCopy: fullDiff
  

IceIndex>>#updateIndex: updateIndex: anIceDiff anIceDiff tree accept: (IceIndexUpdateVisitor new index: self; diff: anIceDiff). adds the changed locations to the in-image git index.

icebergRepository index updateIndex: fullDiff
  

After changes are written to disk and the in-image git index is updates a commit can be done. This is in the method IceRepository>>#commitIndexWithMessage:andParents: commitIndexWithMessage: message andParents: parentCommitishList "Low level. Commit what is saved in the index" | newCommit | newCommit := index commitWithMessage: message andParents: parentCommitishList. index := self newIndex. self workingCopy referenceCommit: newCommit. self workingCopy refreshDirtyPackages. ^ newCommit

newCommit := icebergRepository
	commitIndexWithMessage: 'Example commit'
	andParents: (workingCopy workingCopyState 
		referenceCommits reject: [ :each | each isNoCommit ]).
  

First changes are written to disk using IceGitIndex>>#addToGitIndex addToGitIndex repository addFilesToIndex: modifiedFilePaths.

Second a new empty index is created and installed in the repository

Third the state of the working copy and of all packages is updated based on the new commit

Merging

The merge between two versions is implemented by IceMerge Object subclass: #IceMerge instanceVariableNames: 'mergeTree repository mergeCommit imageCommit changesToWorkingCopyTree' classVariableNames: '' package: 'Iceberg-Changes' . This computes a merge tree with the changes that should be applied during the merge. The tree contain as nodes IceNode IceAbstractNode subclass: #IceNode instanceVariableNames: 'parent childrenDictionary value' classVariableNames: '' package: 'Iceberg-Changes' objects that have as values subclasses of IceOperationMerge Object subclass: #IceOperationMerge instanceVariableNames: 'chosen' classVariableNames: '' package: 'Iceberg-Changes' . There are only two such types of operations:

- IceConflictingOperation IceOperationMerge subclass: #IceConflictingOperation instanceVariableNames: 'leftOperation rightOperation' classVariableNames: '' package: 'Iceberg-Changes' : a conflict between two operations that can be solved by using #selector selectLeft chosen := leftOperation and #selector selectRight chosen := rightOperation .

- IceNonConflictingOperation IceOperationMerge subclass: #IceNonConflictingOperation instanceVariableNames: 'operation' classVariableNames: '' package: 'Iceberg-Changes' : a non-conflict between two operations that can be solved automatically. The user can still override the automatic choice using #selectLeft and #selectRight.

otherBranch := icebergRepository branchNamed: 'release'.
  
mergeAction := IceMerge new
	repository: icebergRepository;
	mergeCommit: otherBranch commit;
	yourself
  

The commit in case of merge uses the same logic as a normal user commit. Also the method IceWorkingCopy>>#commitChanges:withMessage:force: commitChanges: aDiff withMessage: message force: forcing "Creates a commit with the given changes using the comment given as argument. The forcing parameter allows to create an empty commit. This is used by the merge. NOTICE that commits can only be done if the following is true: - HEAD is a branch - the working copy reference commit is the same commit as #headCommit" | newCommit | self validateCanCommit. self repository index updateDiskWorkingCopy: aDiff; updateIndex: aDiff. (forcing not and: [repository index isEmpty]) ifTrue: [ IceNothingToCommit signal ]. newCommit := self repository commitIndexWithMessage: message andParents: (self workingCopyState referenceCommits reject: [ :each | each isNoCommit ]). ^ newCommit is used. The difference is that the first parameter is now an instance of IceMerge Object subclass: #IceMerge instanceVariableNames: 'mergeTree repository mergeCommit imageCommit changesToWorkingCopyTree' classVariableNames: '' package: 'Iceberg-Changes' instead of a IceDiff Object subclass: #IceDiff instanceVariableNames: 'tree source target writerClass mergedTree' classVariableNames: '' package: 'Iceberg-Changes' . The commit logic can visit both normal IceOperation Object subclass: #IceOperation instanceVariableNames: 'definition' classVariableNames: '' package: 'Iceberg-Changes' and IceOperationMerge Object subclass: #IceOperationMerge instanceVariableNames: 'chosen' classVariableNames: '' package: 'Iceberg-Changes' (IceGitWorkingCopyUpdateVisitor IceTreeVisitor subclass: #IceGitWorkingCopyUpdateVisitor instanceVariableNames: 'repository diff index' classVariableNames: '' package: 'Iceberg-Libgit-Commit' and IceIndexUpdateVisitor IceTreeVisitor subclass: #IceIndexUpdateVisitor instanceVariableNames: 'diff index' classVariableNames: '' package: 'Iceberg-Libgit-Commit' )

Keeping the model up to data

Iceberg registers to system announcers in IceSystemEventListener>>#registerSystemAnnouncements registerSystemAnnouncements self unregisterSystemAnnouncements. SystemAnnouncer uniqueInstance weak when: ClassAnnouncement send: #handleClassChange: to: self; when: MethodAnnouncement send: #handleMethodChange: to: self; when: ClassTagAnnouncement send: #handlePackageChange: to: self; when: MCVersionLoaderStopped send: #handleVersionLoaded: to: self. and triggers a IceRepositoryModified IceRepositoryAnnouncement subclass: #IceRepositoryModified instanceVariableNames: '' classVariableNames: '' package: 'Iceberg-Announcements' event for the iceberg repository that should be updated.

To detect which repository should be updated, Iceberg traverses the repositories looking for one that contains the changed package.

If the package is loaded into the image, it is marked as dirty in IceWorkingCopy>>#notifyPackageModified: notifyPackageModified: aString <gtPharoPatch: #Pharo> self flag: #pharoTodo. "we cannot use #includesPackageNamed: as is because it can happen that a package is present in a commit but not in image yet?" self shouldIgnoreNotifications ifTrue: [ ^ false ]. (self includesInWorkingCopyPackageNamed: aString) ifTrue: [ | package | package := self packageNamed: aString. package isDirty ifFalse: [ package beDirty ]. ^ true ]. ^ false