How to detect the encoding of a text file
While some files can have encoding information in their header, it can still be imprecise or missing.
One way to check if a file uses an encoding is to read its content and try decoding it using a that encoding.
ZnCharacterEncoder>>#detectEncoding:
provides support for detecting the encoding of a file by successively trying known encodings in a specific order. If no one is found, it signals a ZnCharacterEncodingError
error. This is still a heuristic and can be unreliable.
filename := 'text-encoding.txt'
filename asFileReference ensureDelete.
filename asFileReference writeStreamDo: [ :aStream | aStream nextPutAll: 'Only letters' ]
filename asFileReference writeStreamDo: [ :aStream | aStream nextPutAll: '🙂😊😀😁' ]
encoding := [ filename asFileReference binaryReadStreamDo: [ :in | (ZnCharacterEncoder detectEncoding: in upToEnd) identifier ] ] on: ZnCharacterEncodingError do: [ nil ].
self assert: encoding notNil. filename asFileReference readStreamEncoded: encoding do: [ :stream | stream contents ]