Top-performance tokenization and small-ruleset regular expression matching : A quantitative performance analysis and optimization study on the cell/B.E. processor
|Top-performance tokenization and small-ruleset regular expression matching : A quantitative performance analysis and optimization study on the cell/B.E. processor|
|Published in||International Journal of Parallel Programming|
|Keyword(s)||Cell processor, Regular expression, Tokenization (Extra: Architecture designs, Business analytics, CELL processor, Dedicated hardware, Enterprise applications, High throughput, Language processors, Multi core, Multiple threads, Number of threads, Optimization studies, Performance analysis, Regular expressions, Regular-expression matching, Resource utilizations, Tokenization, Tokenizer, Unstructured data, Wikipedia, Computer architecture, Data handling, Optimization, Search engines, Throughput, Parallel processing systems)|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Remote mirror(s)||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of journal articles|
Top-performance tokenization and small-ruleset regular expression matching : A quantitative performance analysis and optimization study on the cell/B.E. processor is a 2011 journal article written in English by Scarpazza D.P. and published in International Journal of Parallel Programming.
In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in the number of threads or SIMD width. I show the approach's viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider SIMD units in contemporary architecture design.
- This section requires expansion. Please, help!
Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 3 time(s)