ForBo7 // Salman Naqvi
  • Home
  • ForBlog
  • Bits and Bobs
  • Dictionary
  • About

Tokenization

Splitting a document into its component words.

Note

If a word is too long or very uncommon, the word itself may be split. Take the word “supercalifragilisticexpialidocious” as an example. It could be split into “super”, “cali”, “fragilistic”, “expi”, “ali”, and “docious”.

Back to top

ForBo7 // Salman Naqvi © 2022–2025 to ∞ and ForBlog™ by Salman Naqvi

Version 2.2.2.0 | Feedback | Website made with Quarto, by me!