Working with collections in Pharo
TL;DR
This page introduces the most important and useful classes in the collection hierarchy and how to use them to express queries.
Overview
Collections are heavily used in Smalltalk code, so it is important to gain some familiarity with them. In particular, collections enable a very expressive functional programming style embedded within the object-oriented paradigm of Smalltalk.
Here we see an extract of the the hierarchy. The abstract root is Collection
, which defines the common API of the library.
The key sub-hierarchies are the Sequenceable collections, such as Arrays, Strings and OrderedCollections, and Dictionaries . Also of interest, but less commonly used, are Sets and Bags.
There are actually a vast number of classes in the Collection hierarchy, but we will focus just on these few, critical ones.
Collection withAllSubclasses size
Before taking a closer look at the collections API, let's quickly review the key classes.
Sequenceable collections
There are several kinds of collections that implement sequences of elements.
The most important ones to remember are Array
, ByteString
, Symbol
, OrderedCollection
and Interval
.
Pharo provides built-in syntax for literal and dynamic arrays, as well as for (byte) strings and literal symbols. Subclasses of ArrayedCollection
are all collections of fixed length, starting at position 1.
('hello' at: 1) = $h
If you need a growable collection, create an instance of OrderedCollection
. This will be your go-to collection for most purposes.
Note that if you want to sort any kind of collection, just send it the message #sorted
.
'hello' sorted
If, on the other hand, you want a growable collection that
stays
sorted, then you can convert it to a SortedCollection
.
'hello' asSortedCollection
Sets, Bags and Dictionaries
Sets, Bags and Dictionaries form an odd implementation hierarchy. They are all unordered collections.
Each element of a Set
occurs only once, while a Bag
may contain the same element multiple times.
'hello' asSet
'hello' asBag
Sets and bags support union (|
) and intersection (&
) operators.
'hello' asSet & 'there' asSet
A Dictionary
maps keys to values. A MethodDictionary
, for example, is a kind of dictionary that is keyed on messages and maps them to compiled methods.
Set methodDict at: #=
An IdentityDictionary
is a kind of dictionary that uses #==
instead of #=
to compare keys by object identity instead of equality.
The Collections API
Now we'll take a closer look at the API supported by most of the Collections classes. These messages are particularly useful for formulating queries.
Creating collections
There are essentially three ways to create collections:
1. Use the built-in syntax for creating arrays and strings, and then convert the result to the kind of collection you want.
2. Use one of the various #with:
messages to initialize a collection with some values.
3. Create a new, empty collection and incrementally add new elements to it.
Pharo has built-in syntax for strings and literal and dynamic arrays. If you don't want an Array, just send one of the #as*
messages to convert it to a different kind of collection.
#( a b c ) asOrderedCollection
{ 3 + 4 . 6 factorial . 6 * 7 } asSet
A convenient way to initialize a dictionary is to convert a dynamic array of key -> value
associations.
{ #foo -> (3+4) . #bar -> 6 factorial } asDictionary
You can also use one of the messages #with:
, #with:with:
, #with:with:with:
, #with:with:with:with:
, or #withAll:
(or #newFrom:
) to create a collection.
OrderedCollection with: $a with: 42 with: #foobar
Set withAll: 'hello'
Dictionary with: #a -> 1 with: #b -> 2
Finally, you can build up a collection step by step.
set := Set new. set add: #foo. set add: #bar. set addAll: 'hello'. set
Caveat:
if you use a cascade to create a collection, be sure to send the message #yourself
at the end of the cascade to get the collection as the final value. By default, the #add:
method returns the element being added, not the collection. (Try to delete the last message sent and see the result.)
Dictionary new add: #a -> 1; add: #b -> 2; yourself
Accessing and updating collections
You can access collections positionally with #at:
or one of its variants.
'hello' at: 1
{ $a -> 1 . $b -> 2 } asDictionary at: $a
Although it is common to compose and transform Smalltalk collections in a functional style, many collections are mutable. Use #at:put:
and its variants to update a collection in place.
#( 1 2 3 ) asOrderedCollection at: 1 put: 0; yourself
NB: As with #add:
, both #at:
and #at:put:
return an element, not the collection, so send #yourself
at the end of a cascade to get the collection (or assign it to a variable first).
Caveat: arrays and strings are not mutable. If you want to update them in place, you have to convert them to an OrderedCollection and back again.
String newFrom: ('hello' asOrderedCollection at: 2 put: $u; yourself)
You can remove elements from a collection with #remove:
and friends.
(1 to: 20) asSet removeAll: (Integer primesUpTo: 20)
Querying and enumerating collections
To illustrate the Collections API, we will use a running example of querying the Collections hierarchy itself. In Smalltalk, we can ask a class if it is abstract by sending it the message #isAbstract
. The same message can also be sent to a (compiled) method. We are curious to know which classes in the Collections hierarchy are abstract, which are there abstract methods, and also if there are classes with abstract methods that are not themselves declared as abstract.
Iterating with #do:
You can use the #do:
message to iterate over a collection in an imperative style.
Consider the following, rather verbose way of finding all the abstract subclasses of Collection
.
abstractCollections := OrderedCollection new. Collection withAllSubclasses do: [ :each | each isAbstract ifTrue: [ abstractCollections add: each ] ]. abstractCollections
Filtering with #select:
, #reject:
and #detect:
Smalltalk collections really shine, however, when you write queries in a functional style. Consider this:
Collection withAllSubclasses select: [ :each | each isAbstract ]
The message #select:
uses a Boolean block to select those elements of the collection satisying the block.
We can do even better. By means of some clever duck-typing , a symbol behaves like a one-argument block that sends itself as a message to the argument. As a consequence, we can simply write:
Collection withAllSubclasses select: #isAbstract
Every time you find yourself sending the message #do:
you should ask yourself if there is a more elegant, functional way to do the same thing.
If you want the classes that are not abstract, you can reverse the boolean:
Collection withAllSubclasses select: [ :each | each isAbstract not ]
Or, more elegantly, use #reject:
Collection withAllSubclasses reject: #isAbstract
If you just want the first element that satisfies a condition, use detect:
to avoid iterating over the whole collection. For example, we can look for the first method of the Collection class that is abstract:
Collection methods detect: #isAbstract
Now we can put all the pieces together to see if there any classes in the Collections hierarchy that define abstract methods, but are not themselves declared as abstract.
concreteClassWithAbstractMethods := (Collection withAllSubclasses reject: #isAbstract) select: [ :each | each methods anySatisfy: #isAbstract]
Interestingly several classes, such as String
appearto be concrete, though they have abstract methods.
Transforming collections with #collect:
and #flatCollect:
If you evaluate the snippet above, you will see several concrete classes with abstract methods, but which are these methods?
Sending #collect:
to a collection will transform each element using the argument block. Let's extract the list of abstract methods for each class.
concreteClassWithAbstractMethods collect: [ :each | each methods select: #isAbstract ]
This yields a collection of arrays of methods. If we want just a flat collection, we can simply send #flatCollect:
instead.
concreteClassWithAbstractMethods flatCollect: [ :each | each methods select: #isAbstract ]
Folding collections with #inject:into:
Sometimes you want as a result of a query not another collection, but a value that depends on all the elements. While this can be done with a do:
loop, often a more elegant solution can be achieved by sending #inject:into:
to a collection.
The first argument is an initial seed value, and the second is a two-argument block that transforms the seed using each element of the collection.
Standard examples are to compute the sum or product of a list of numbers:
(1 to: 10) inject: 0 into: [:sum : each | sum + each ]
(1 to: 10) inject: 1 into: [:product : each | product * each ]
Suppose we want to learn which class in the Collection hierarchy defines the largest number of methods. This is simple, but somewhat verbose when written as a do-loop. Instead we can use #inject:into:
as follows:
Collection withAllSubclasses inject: Collection into: [ :biggest :each | biggest methods size > each methods size ifTrue: [ biggest ] ifFalse: [ each ] ]
What's next?
To learn more about querying object models, see: