Working with collections in Pharo

TL;DR

This page introduces the most important and useful classes in the collection hierarchy and how to use them to express queries.

Overview

Collections are heavily used in Smalltalk code, so it is important to gain some familiarity with them. In particular, collections enable a very expressive functional programming style embedded within the object-oriented paradigm of Smalltalk.

Here we see an extract of the the hierarchy. The abstract root is Collection Object subclass: #Collection instanceVariableNames: '' classVariableNames: '' package: 'Collections-Abstract-Base' , which defines the common API of the library.

The key sub-hierarchies are the Sequenceable collections, such as Arrays, Strings and OrderedCollections, and Dictionaries . Also of interest, but less commonly used, are Sets and Bags.

There are actually a vast number of classes in the Collection hierarchy, but we will focus just on these few, critical ones.

Collection withAllSubclasses size
  

Before taking a closer look at the collections API, let's quickly review the key classes.

Sequenceable collections

There are several kinds of collections that implement sequences of elements.

The most important ones to remember are Array ArrayedCollection variableSubclass: #Array instanceVariableNames: '' classVariableNames: '' package: 'Collections-Sequenceable-Base' , ByteString String variableByteSubclass: #ByteString instanceVariableNames: '' classVariableNames: 'NonAsciiMap' package: 'Collections-Strings-Base' , Symbol String subclass: #Symbol instanceVariableNames: '' classVariableNames: 'NewSymbols SelectorTable SymbolTable' package: 'Collections-Strings-Base' , OrderedCollection SequenceableCollection subclass: #OrderedCollection instanceVariableNames: 'array firstIndex lastIndex' classVariableNames: '' package: 'Collections-Sequenceable-Ordered' and Interval SequenceableCollection subclass: #Interval instanceVariableNames: 'start stop step' classVariableNames: '' package: 'Collections-Sequenceable-Base' .

Pharo provides built-in syntax for literal and dynamic arrays, as well as for (byte) strings and literal symbols. Subclasses of ArrayedCollection SequenceableCollection subclass: #ArrayedCollection instanceVariableNames: '' classVariableNames: '' package: 'Collections-Abstract-Base' are all collections of fixed length, starting at position 1.

('hello' at: 1) = $h
  

If you need a growable collection, create an instance of OrderedCollection SequenceableCollection subclass: #OrderedCollection instanceVariableNames: 'array firstIndex lastIndex' classVariableNames: '' package: 'Collections-Sequenceable-Ordered' . This will be your go-to collection for most purposes.

Note that if you want to sort any kind of collection, just send it the message #sorted.

'hello' sorted
  

If, on the other hand, you want a growable collection that stays sorted, then you can convert it to a SortedCollection OrderedCollection subclass: #SortedCollection instanceVariableNames: 'sortBlock' classVariableNames: '' package: 'Collections-Sequenceable-Ordered' .

'hello' asSortedCollection
  

Sets, Bags and Dictionaries

Sets, Bags and Dictionaries form an odd implementation hierarchy. They are all unordered collections.

Each element of a Set HashedCollection subclass: #Set instanceVariableNames: '' classVariableNames: '' package: 'Collections-Unordered-Sets' occurs only once, while a Bag Collection subclass: #Bag instanceVariableNames: 'contents' classVariableNames: '' package: 'Collections-Unordered-Bags' may contain the same element multiple times.

'hello' asSet
  
'hello' asBag
  

Sets and bags support union (|) and intersection (&) operators.

'hello' asSet & 'there' asSet
  

A Dictionary HashedCollection subclass: #Dictionary instanceVariableNames: '' classVariableNames: '' package: 'Collections-Unordered-Dictionaries' maps keys to values. A MethodDictionary Dictionary variableSubclass: #MethodDictionary instanceVariableNames: '' classVariableNames: '' package: 'Kernel-Methods' , for example, is a kind of dictionary that is keyed on messages and maps them to compiled methods.

Set methodDict at: #=
  

An IdentityDictionary Dictionary subclass: #IdentityDictionary instanceVariableNames: '' classVariableNames: '' package: 'Collections-Unordered-Dictionaries' is a kind of dictionary that uses #== instead of #= to compare keys by object identity instead of equality.

The Collections API

Now we'll take a closer look at the API supported by most of the Collections classes. These messages are particularly useful for formulating queries.

Creating collections

There are essentially three ways to create collections:

1. Use the built-in syntax for creating arrays and strings, and then convert the result to the kind of collection you want.

2. Use one of the various #with: messages to initialize a collection with some values.

3. Create a new, empty collection and incrementally add new elements to it.

Pharo has built-in syntax for strings and literal and dynamic arrays. If you don't want an Array, just send one of the #as* messages to convert it to a different kind of collection.

#( a b c ) asOrderedCollection
  
{ 3 + 4 . 6 factorial . 6 * 7 } asSet
  

A convenient way to initialize a dictionary is to convert a dynamic array of key -> value associations.

{ #foo -> (3+4) . #bar -> 6 factorial } asDictionary
  

You can also use one of the messages #with:, #with:with:, #with:with:with:, #with:with:with:with:, or #withAll: (or #newFrom:) to create a collection.

OrderedCollection with: $a with: 42 with: #foobar
  
Set withAll: 'hello'
  
Dictionary with: #a -> 1 with: #b -> 2
  

Finally, you can build up a collection step by step.

set := Set new.
set add: #foo.
set add: #bar.
set addAll: 'hello'.
set
  

Caveat: if you use a cascade to create a collection, be sure to send the message #yourself at the end of the cascade to get the collection as the final value. By default, the #add: method returns the element being added, not the collection. (Try to delete the last message sent and see the result.)

Dictionary new
	add: #a -> 1;
	add: #b -> 2;
	yourself
  

Accessing and updating collections

You can access collections positionally with #at: or one of its variants.

'hello' at: 1
  
{ $a -> 1 . $b -> 2 } asDictionary at: $a
  

Although it is common to compose and transform Smalltalk collections in a functional style, many collections are mutable. Use #at:put: and its variants to update a collection in place.

#( 1 2 3 ) asOrderedCollection
	at: 1 put: 0;
	yourself
  

NB: As with #add:, both #at: and #at:put: return an element, not the collection, so send #yourself at the end of a cascade to get the collection (or assign it to a variable first).

Caveat: arrays and strings are not mutable. If you want to update them in place, you have to convert them to an OrderedCollection and back again.

String newFrom: ('hello' asOrderedCollection at: 2 put: $u; yourself)
  

You can remove elements from a collection with #remove: and friends.

(1 to: 20) asSet removeAll: (Integer primesUpTo: 20)
  

Querying and enumerating collections

To illustrate the Collections API, we will use a running example of querying the Collections hierarchy itself. In Smalltalk, we can ask a class if it is abstract by sending it the message #isAbstract. The same message can also be sent to a (compiled) method. We are curious to know which classes in the Collections hierarchy are abstract, which are there abstract methods, and also if there are classes with abstract methods that are not themselves declared as abstract.

Iterating with #do:

You can use the #do: message to iterate over a collection in an imperative style.

Consider the following, rather verbose way of finding all the abstract subclasses of Collection Object subclass: #Collection instanceVariableNames: '' classVariableNames: '' package: 'Collections-Abstract-Base' .

abstractCollections := OrderedCollection new.
Collection withAllSubclasses do: [ :each | 
	each isAbstract ifTrue: [ abstractCollections add: each ] ].
abstractCollections
  

Filtering with #select:, #reject: and #detect:

Smalltalk collections really shine, however, when you write queries in a functional style. Consider this:

Collection withAllSubclasses select: [ :each | each isAbstract ]
  

The message #select: uses a Boolean block to select those elements of the collection satisying the block.

We can do even better. By means of some clever duck-typing , a symbol behaves like a one-argument block that sends itself as a message to the argument. As a consequence, we can simply write:

Collection withAllSubclasses select: #isAbstract
  

Every time you find yourself sending the message #do: you should ask yourself if there is a more elegant, functional way to do the same thing.

If you want the classes that are not abstract, you can reverse the boolean:

Collection withAllSubclasses select: [ :each | each isAbstract not ]
  

Or, more elegantly, use #reject:

Collection withAllSubclasses reject: #isAbstract
  

If you just want the first element that satisfies a condition, use detect: to avoid iterating over the whole collection. For example, we can look for the first method of the Collection class that is abstract:

Collection methods detect: #isAbstract
  

Now we can put all the pieces together to see if there any classes in the Collections hierarchy that define abstract methods, but are not themselves declared as abstract.

concreteClassWithAbstractMethods := 
	(Collection withAllSubclasses reject: #isAbstract)
		select: [ :each | 
			each methods anySatisfy: #isAbstract]
  

Interestingly several classes, such as String ArrayedCollection subclass: #String instanceVariableNames: '' classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators CSSeparators CaseInsensitiveOrder CaseSensitiveOrder LowercasingTable Tokenish TypeTable UppercasingTable' package: 'Collections-Strings-Base' appearto be concrete, though they have abstract methods.

Transforming collections with #collect: and #flatCollect:

If you evaluate the snippet above, you will see several concrete classes with abstract methods, but which are these methods?

Sending #collect: to a collection will transform each element using the argument block. Let's extract the list of abstract methods for each class.

concreteClassWithAbstractMethods collect: [ :each | 
	each methods select: #isAbstract ]
  

This yields a collection of arrays of methods. If we want just a flat collection, we can simply send #flatCollect: instead.

concreteClassWithAbstractMethods flatCollect: [ :each | 
	each methods select: #isAbstract ]
  

Folding collections with #inject:into:

Sometimes you want as a result of a query not another collection, but a value that depends on all the elements. While this can be done with a do: loop, often a more elegant solution can be achieved by sending #inject:into: to a collection.

The first argument is an initial seed value, and the second is a two-argument block that transforms the seed using each element of the collection.

Standard examples are to compute the sum or product of a list of numbers:

(1 to: 10) inject: 0 into: [:sum : each | sum + each ]
  
(1 to: 10) inject: 1 into: [:product : each | product * each ]
  

Suppose we want to learn which class in the Collection hierarchy defines the largest number of methods. This is simple, but somewhat verbose when written as a do-loop. Instead we can use #inject:into: as follows:

Collection withAllSubclasses
	inject: Collection
	into: [ :biggest :each | 
		biggest methods size > each methods size
			ifTrue: [ biggest ]
			ifFalse: [ each ] ]
  

What's next?

To learn more about querying object models, see: