Class CodeRay::Scanners::Scanner
In: lib/coderay/scanner.rb
Parent: StringScanner

Scanner

The base class for all Scanners.

It is a subclass of Ruby‘s great StringScanner, which makes it easy to access the scanning methods inside.

It is also Enumerable, so you can use it like an Array of Tokens:

  require 'coderay'

  c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"

  for text, kind in c_scanner
    puts text if kind == :operator
  end

  # prints: (*==)++;

OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.

Methods

Included Modules

Enumerable

Constants

ScanError = Class.new StandardError   Raised if a Scanner fails while scanning
DEFAULT_OPTIONS = { }   The default options for all scanner classes.

Define @default_options for subclasses.

KINDS_NOT_LOC = [:comment, :doctype, :docstring]

Attributes

state  [RW] 

Public Class methods

The encoding used internally by this scanner.

[Source]

    # File lib/coderay/scanner.rb, line 89
89:         def encoding name = 'UTF-8'
90:           @encoding ||= defined?(Encoding.find) && Encoding.find(name)
91:         end

The typical filename suffix for this scanner‘s language.

[Source]

    # File lib/coderay/scanner.rb, line 84
84:         def file_extension extension = lang
85:           @file_extension ||= extension.to_s
86:         end

The lang of this Scanner class, which is equal to its Plugin ID.

[Source]

    # File lib/coderay/scanner.rb, line 94
94:         def lang
95:           @plugin_id
96:         end

Create a new Scanner.

  • code is the input String and is handled by the superclass StringScanner.
  • options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)

Else, a Tokens object is used.

[Source]

     # File lib/coderay/scanner.rb, line 143
143:       def initialize code = '', options = {}
144:         if self.class == Scanner
145:           raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses."
146:         end
147:         
148:         @options = self.class::DEFAULT_OPTIONS.merge options
149:         
150:         super self.class.normalize(code)
151:         
152:         @tokens = options[:tokens] || Tokens.new
153:         @tokens.scanner = self if @tokens.respond_to? :scanner=
154:         
155:         setup
156:       end

Normalizes the given code into a string with UNIX newlines, in the scanner‘s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.

[Source]

    # File lib/coderay/scanner.rb, line 69
69:         def normalize code
70:           # original = code
71:           code = code.to_s unless code.is_a? ::String
72:           return code if code.empty?
73:           
74:           if code.respond_to? :encoding
75:             code = encode_with_encoding code, self.encoding
76:           else
77:             code = to_unix code
78:           end
79:           # code = code.dup if code.eql? original
80:           code
81:         end

Protected Class methods

[Source]

     # File lib/coderay/scanner.rb, line 100
100:         def encode_with_encoding code, target_encoding
101:           if code.encoding == target_encoding
102:             if code.valid_encoding?
103:               return to_unix(code)
104:             else
105:               source_encoding = guess_encoding code
106:             end
107:           else
108:             source_encoding = code.encoding
109:           end
110:           # print "encode_with_encoding from #{source_encoding} to #{target_encoding}"
111:           code.encode target_encoding, source_encoding, :universal_newline => true, :undef => :replace, :invalid => :replace
112:         end

[Source]

     # File lib/coderay/scanner.rb, line 118
118:         def guess_encoding s
119:           #:nocov:
120:           IO.popen("file -b --mime -", "w+") do |file|
121:             file.write s[0, 1024]
122:             file.close_write
123:             begin
124:               Encoding.find file.gets[/charset=([-\w]+)/, 1]
125:             rescue ArgumentError
126:               Encoding::BINARY
127:             end
128:           end
129:           #:nocov:
130:         end

[Source]

     # File lib/coderay/scanner.rb, line 114
114:         def to_unix code
115:           code.index(?\r) ? code.gsub(/\r\n?/, "\n") : code
116:         end

Public Instance methods

The string in binary encoding.

To be used with pos, which is the index of the byte the scanner will scan next.

[Source]

     # File lib/coderay/scanner.rb, line 243
243:       def binary_string
244:         @binary_string ||=
245:           if string.respond_to?(:bytesize) && string.bytesize != string.size
246:             #:nocov:
247:             string.dup.force_encoding('binary')
248:             #:nocov:
249:           else
250:             string
251:           end
252:       end

The current column position of the scanner, starting with 1. See also: line.

[Source]

     # File lib/coderay/scanner.rb, line 234
234:       def column pos = self.pos
235:         return 1 if pos <= 0
236:         pos - (binary_string.rindex(?\n, pos - 1) || -1)
237:       end

Traverse the tokens.

[Source]

     # File lib/coderay/scanner.rb, line 217
217:       def each &block
218:         tokens.each(&block)
219:       end

the default file extension for this scanner

[Source]

     # File lib/coderay/scanner.rb, line 178
178:       def file_extension
179:         self.class.file_extension
180:       end

the Plugin ID for this scanner

[Source]

     # File lib/coderay/scanner.rb, line 173
173:       def lang
174:         self.class.lang
175:       end

The current line position of the scanner, starting with 1. See also: column.

Beware, this is implemented inefficiently. It should be used for debugging only.

[Source]

     # File lib/coderay/scanner.rb, line 227
227:       def line pos = self.pos
228:         return 1 if pos <= 0
229:         binary_string[0...pos].count("\n") + 1
230:       end

Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.

[Source]

     # File lib/coderay/scanner.rb, line 160
160:       def reset
161:         super
162:         reset_instance
163:       end

Set a new string to be scanned.

[Source]

     # File lib/coderay/scanner.rb, line 166
166:       def string= code
167:         code = self.class.normalize(code)
168:         super code
169:         reset_instance
170:       end

Scan the code and returns all tokens in a Tokens object.

[Source]

     # File lib/coderay/scanner.rb, line 183
183:       def tokenize source = nil, options = {}
184:         options = @options.merge(options)
185:         @tokens = options[:tokens] || @tokens || Tokens.new
186:         @tokens.scanner = self if @tokens.respond_to? :scanner=
187:         case source
188:         when Array
189:           self.string = self.class.normalize(source.join)
190:         when nil
191:           reset
192:         else
193:           self.string = self.class.normalize(source)
194:         end
195:         
196:         begin
197:           scan_tokens @tokens, options
198:         rescue => e
199:           message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state]
200:           raise_inspect e.message, @tokens, message, 30, e.backtrace
201:         end
202:         
203:         @cached_tokens = @tokens
204:         if source.is_a? Array
205:           @tokens.split_into_parts(*source.map { |part| part.size })
206:         else
207:           @tokens
208:         end
209:       end

Cache the result of tokenize.

[Source]

     # File lib/coderay/scanner.rb, line 212
212:       def tokens
213:         @cached_tokens ||= tokenize
214:       end

Protected Instance methods

Scanner error with additional status information

[Source]

     # File lib/coderay/scanner.rb, line 281
281:       def raise_inspect msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller
282:         raise ScanError, "\n\n***ERROR in %s: %s (after %d tokens)\n\ntokens:\n%s\n\ncurrent line: %d  column: %d  pos: %d\nmatched: %p  state: %p\nbol? = %p,  eos? = %p\n\nsurrounding code:\n%p  ~~  %p\n\n\n***ERROR***\n\n" % [
283:           File.basename(caller[0]),
284:           msg,
285:           tokens.respond_to?(:size) ? tokens.size : 0,
286:           tokens.respond_to?(:last) ? tokens.last(10).map { |t| t.inspect }.join("\n") : '',
287:           line, column, pos,
288:           matched, state, bol?, eos?,
289:           binary_string[pos - ambit, ambit],
290:           binary_string[pos, ambit],
291:         ], backtrace
292:       end

Resets the scanner.

[Source]

     # File lib/coderay/scanner.rb, line 274
274:       def reset_instance
275:         @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens]
276:         @cached_tokens = nil
277:         @binary_string = nil if defined? @binary_string
278:       end

Shorthand for scan_until(/\z/). This method also avoids a JRuby 1.9 mode bug.

[Source]

     # File lib/coderay/scanner.rb, line 315
315:       def scan_rest
316:         rest = self.rest
317:         terminate
318:         rest
319:       end

This is the central method, and commonly the only one a subclass implements.

Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!

[Source]

     # File lib/coderay/scanner.rb, line 269
269:       def scan_tokens tokens, options  # :doc:
270:         raise NotImplementedError, "#{self.class}#scan_tokens not implemented."
271:       end

Can be implemented by subclasses to do some initialization that has to be done once per instance.

Use reset for initialization that has to be done once per scan.

[Source]

     # File lib/coderay/scanner.rb, line 261
261:       def setup  # :doc:
262:       end

[Validate]