Class | CodeRay::Scanners::Scanner |
In: |
lib/coderay/scanner.rb
|
Parent: | StringScanner |
The base class for all Scanners.
It is a subclass of Ruby‘s great StringScanner, which makes it easy to access the scanning methods inside.
It is also Enumerable, so you can use it like an Array of Tokens:
require 'coderay' c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;" for text, kind in c_scanner puts text if kind == :operator end # prints: (*==)++;
OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.
ScanError | = | Class.new StandardError | Raised if a Scanner fails while scanning | |
DEFAULT_OPTIONS | = | { } |
The default options for all scanner classes.
Define @default_options for subclasses. |
|
KINDS_NOT_LOC | = | [:comment, :doctype, :docstring] |
state | [RW] |
The typical filename suffix for this scanner‘s language.
# File lib/coderay/scanner.rb, line 84 84: def file_extension extension = lang 85: @file_extension ||= extension.to_s 86: end
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 143 143: def initialize code = '', options = {} 144: if self.class == Scanner 145: raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." 146: end 147: 148: @options = self.class::DEFAULT_OPTIONS.merge options 149: 150: super self.class.normalize(code) 151: 152: @tokens = options[:tokens] || Tokens.new 153: @tokens.scanner = self if @tokens.respond_to? :scanner= 154: 155: setup 156: end
Normalizes the given code into a string with UNIX newlines, in the scanner‘s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
# File lib/coderay/scanner.rb, line 69 69: def normalize code 70: # original = code 71: code = code.to_s unless code.is_a? ::String 72: return code if code.empty? 73: 74: if code.respond_to? :encoding 75: code = encode_with_encoding code, self.encoding 76: else 77: code = to_unix code 78: end 79: # code = code.dup if code.eql? original 80: code 81: end
# File lib/coderay/scanner.rb, line 100 100: def encode_with_encoding code, target_encoding 101: if code.encoding == target_encoding 102: if code.valid_encoding? 103: return to_unix(code) 104: else 105: source_encoding = guess_encoding code 106: end 107: else 108: source_encoding = code.encoding 109: end 110: # print "encode_with_encoding from #{source_encoding} to #{target_encoding}" 111: code.encode target_encoding, source_encoding, :universal_newline => true, :undef => :replace, :invalid => :replace 112: end
# File lib/coderay/scanner.rb, line 118 118: def guess_encoding s 119: #:nocov: 120: IO.popen("file -b --mime -", "w+") do |file| 121: file.write s[0, 1024] 122: file.close_write 123: begin 124: Encoding.find file.gets[/charset=([-\w]+)/, 1] 125: rescue ArgumentError 126: Encoding::BINARY 127: end 128: end 129: #:nocov: 130: end
# File lib/coderay/scanner.rb, line 114 114: def to_unix code 115: code.index(?\r) ? code.gsub(/\r\n?/, "\n") : code 116: end
The string in binary encoding.
To be used with pos, which is the index of the byte the scanner will scan next.
# File lib/coderay/scanner.rb, line 243 243: def binary_string 244: @binary_string ||= 245: if string.respond_to?(:bytesize) && string.bytesize != string.size 246: #:nocov: 247: string.dup.force_encoding('binary') 248: #:nocov: 249: else 250: string 251: end 252: end
the default file extension for this scanner
# File lib/coderay/scanner.rb, line 178 178: def file_extension 179: self.class.file_extension 180: end
The current line position of the scanner, starting with 1. See also: column.
Beware, this is implemented inefficiently. It should be used for debugging only.
# File lib/coderay/scanner.rb, line 227 227: def line pos = self.pos 228: return 1 if pos <= 0 229: binary_string[0...pos].count("\n") + 1 230: end
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
# File lib/coderay/scanner.rb, line 160 160: def reset 161: super 162: reset_instance 163: end
Scan the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 183 183: def tokenize source = nil, options = {} 184: options = @options.merge(options) 185: @tokens = options[:tokens] || @tokens || Tokens.new 186: @tokens.scanner = self if @tokens.respond_to? :scanner= 187: case source 188: when Array 189: self.string = self.class.normalize(source.join) 190: when nil 191: reset 192: else 193: self.string = self.class.normalize(source) 194: end 195: 196: begin 197: scan_tokens @tokens, options 198: rescue => e 199: message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] 200: raise_inspect e.message, @tokens, message, 30, e.backtrace 201: end 202: 203: @cached_tokens = @tokens 204: if source.is_a? Array 205: @tokens.split_into_parts(*source.map { |part| part.size }) 206: else 207: @tokens 208: end 209: end
Scanner error with additional status information
# File lib/coderay/scanner.rb, line 281 281: def raise_inspect msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller 282: raise ScanError, "\n\n***ERROR in %s: %s (after %d tokens)\n\ntokens:\n%s\n\ncurrent line: %d column: %d pos: %d\nmatched: %p state: %p\nbol? = %p, eos? = %p\n\nsurrounding code:\n%p ~~ %p\n\n\n***ERROR***\n\n" % [ 283: File.basename(caller[0]), 284: msg, 285: tokens.respond_to?(:size) ? tokens.size : 0, 286: tokens.respond_to?(:last) ? tokens.last(10).map { |t| t.inspect }.join("\n") : '', 287: line, column, pos, 288: matched, state, bol?, eos?, 289: binary_string[pos - ambit, ambit], 290: binary_string[pos, ambit], 291: ], backtrace 292: end
Resets the scanner.
# File lib/coderay/scanner.rb, line 274 274: def reset_instance 275: @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens] 276: @cached_tokens = nil 277: @binary_string = nil if defined? @binary_string 278: end
Shorthand for scan_until(/\z/). This method also avoids a JRuby 1.9 mode bug.
# File lib/coderay/scanner.rb, line 315 315: def scan_rest 316: rest = self.rest 317: terminate 318: rest 319: end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 269 269: def scan_tokens tokens, options # :doc: 270: raise NotImplementedError, "#{self.class}#scan_tokens not implemented." 271: end