How to detect the encoding of a text file

While some files can have encoding information in their header, it can still be imprecise or missing.

One way to check if a file uses an encoding is to read its content and try decoding it using a that encoding.

ZnCharacterEncoder>>#detectEncoding: detectEncoding: bytes "Return one of my instances capable of decoding bytes. This is done by successively trying known encodings in a specific order. If no one is found, signal ZnCharacterEncodingError. This is a heuristic and unreliable [https://en.wikipedia.org/wiki/Charset_detection]." | candidates | "Set up an ordered candidates list, 7-bit ascii and utf8 are reasonably reliable, iso88591 is a reasonable default" candidates := #(ascii utf8 iso88591). candidates := candidates , (ZnByteEncoder knownEncodingIdentifiers difference: candidates). candidates := candidates , (self knownEncodingIdentifiers difference: candidates). "Try each and return the first one that succeeeds." candidates do: [ :identifier | | encoder | encoder := self newForEncoding: identifier. [ ^ encoder decodeBytes: bytes; yourself ] on: ZnCharacterEncodingError do: [ ] ]. ZnCharacterEncodingError signal: 'No suitable encoder found' provides support for detecting the encoding of a file by successively trying known encodings in a specific order. If no one is found, it signals a ZnCharacterEncodingError Error << #ZnCharacterEncodingError slots: {}; package: 'Zinc-Character-Encoding-Core' error. This is still a heuristic and can be unreliable.

filename := 'text-encoding.txt'
  
filename asFileReference ensureDelete.
  
filename asFileReference writeStreamDo: [ :aStream |
	aStream nextPutAll: 'Only letters' ]
  
filename asFileReference writeStreamDo: [ :aStream |
	aStream nextPutAll: '🙂😊😀😁' ]
  
encoding := [ filename asFileReference
	binaryReadStreamDo: [ :in | 
		(ZnCharacterEncoder detectEncoding: in upToEnd) identifier ] ]
				on: ZnCharacterEncodingError
				do: [ nil ].
  
self assert: encoding notNil.

filename asFileReference
	readStreamEncoded: encoding
	do: [ :stream | stream contents ]