What is tokenization?medium
Tokenization splits text into units such as words, subwords, or characters.
Tokenization converts raw text into model-readable pieces. Modern models often use subword tokenization to handle rare words and multiple languages.