class BufferedTokenizer

BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by default. It allows input to be spoon-fed from some outside source which receives arbitrary length datagrams which may-or-may-not contain the token by which entities are delimited. In this respect it's ideally paired with something like EventMachine (rubyeventmachine.com/).

Public Class Methods

new(delimiter = $/) click to toggle source

New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ (ā€œnā€).

The input buffer is stored as an array. This is by far the most efficient approach given language constraints (in C a linked list would be a more appropriate data structure). Segments of input data are stored in a list which is only joined when a token is reached, substantially reducing the number of objects required for the operation.

# File lib/em/buftok.rb, line 15
def initialize(delimiter = $/)
  @delimiter = delimiter
  @input = []
  @tail = ''
  @trim = @delimiter.length - 1
end

Public Instance Methods

extract(data) click to toggle source

Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract. This makes for easy processing of datagrams using a pattern like:

tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...

Using -1 makes split to return ā€œā€ if the token is at the end of the string, meaning the last element is the start of the next chunk.

# File lib/em/buftok.rb, line 30
def extract(data)
  if @trim > 0
    tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
    data = tail_end + data if tail_end
  end

  @input << @tail
  entities = data.split(@delimiter, -1)
  @tail = entities.shift

  unless entities.empty?
    @input << @tail
    entities.unshift @input.join
    @input.clear
    @tail = entities.pop
  end

  entities
end
flush() click to toggle source

Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered

# File lib/em/buftok.rb, line 52
def flush
  @input << @tail
  buffer = @input.join
  @input.clear
  @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7
  buffer
end