module Linguist::BlobHelper
DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like `Linguist.detect` over `Blob#language`. Functions are much easier to cache and compose.
Avoid adding additional bloat to this module.
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constants
- DETECTABLE_TYPES
- DocumentationRegexp
- MEGABYTE
- VendoredRegexp
Public Instance Methods
Internal: Lookup mime type for extension.
Returns a MIME::Type
# File lib/linguist/blob_helper.rb, line 32 def _mime_type if defined? @_mime_type @_mime_type else guesses = ::MIME::Types.type_for(extname.to_s) # Prefer text mime types over binary @_mime_type = guesses.detect { |type| type.ascii? } || # Otherwise use the first guess guesses.first end end
Public: Is the blob binary?
Return true or false
# File lib/linguist/blob_helper.rb, line 130 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end
Internal: Is the blob binary according to its mime type
Return true or false
# File lib/linguist/blob_helper.rb, line 60 def binary_mime_type? _mime_type ? _mime_type.binary? : false end
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8' # => 'application/octet-stream'
Returns a content type String.
# File lib/linguist/blob_helper.rb, line 83 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end
Public: Is this blob a CSV file?
Return true or false
# File lib/linguist/blob_helper.rb, line 180 def csv? text? && extname.downcase == '.csv' end
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or no valid encoding could be found
# File lib/linguist/blob_helper.rb, line 123 def detect_encoding @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar" # => "inline"
Returns a content disposition String.
# File lib/linguist/blob_helper.rb, line 96 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else "attachment; filename=#{EscapeUtils.escape_url(name)}" end end
Public: Is the blob in a documentation directory?
Documentation files are ignored by language statistics.
See “documentation.yml” for a list of documentation conventions that match this pattern.
Return true or false
# File lib/linguist/blob_helper.rb, line 250 def documentation? path =~ DocumentationRegexp ? true : false end
Public: Is the blob empty?
Return true or false
# File lib/linguist/blob_helper.rb, line 152 def empty? data.nil? || data == "" end
# File lib/linguist/blob_helper.rb, line 106 def encoding if hash = detect_encoding hash[:encoding] end end
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname # => '.rb'
Returns a String
# File lib/linguist/blob_helper.rb, line 25 def extname File.extname(name.to_s) end
Public: Is the blob a generated file?
Generated source code is suppressed in diffs and is ignored by language statistics.
May load Linguist::Blob#data
Return true or false
# File lib/linguist/blob_helper.rb, line 318 def generated? @_generated ||= Generated.generated?(path, lambda { data }) end
Internal: Does the blob have a ratio of long lines?
Return true or false
# File lib/linguist/blob_helper.rb, line 210 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end
Public: Is the blob a supported image format?
Return true or false
# File lib/linguist/blob_helper.rb, line 166 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase) end
Internal: Should this blob be included in repository language statistics?
# File lib/linguist/blob_helper.rb, line 339 def include_in_language_stats? !vendored? && !documentation? && !generated? && language && DETECTABLE_TYPES.include?(language.type) end
Public: Detects the Language of the blob.
May load Linguist::Blob#data
Returns a Language or nil if none is detected
# File lib/linguist/blob_helper.rb, line 327 def language @language ||= Linguist.detect(self) end
Public: Is the blob too big to load?
Return true or false
# File lib/linguist/blob_helper.rb, line 196 def large? size.to_i > MEGABYTE end
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
Return true or false
# File lib/linguist/blob_helper.rb, line 69 def likely_binary? binary_mime_type? && !Language.find_by_filename(name) end
Public: Get each line of data
Requires Linguist::Blob#data
Returns an Array of lines
# File lib/linguist/blob_helper.rb, line 259 def lines @lines ||= if viewable? && data # `data` is usually encoded as ASCII-8BIT even when the content has # been detected as a different encoding. However, we are not allowed # to change the encoding of `data` because we've made the implicit # guarantee that each entry in `lines` is encoded the same way as # `data`. # # Instead, we re-encode each possible newline sequence as the # detected encoding, then force them back to the encoding of `data` # (usually a binary encoding like ASCII-8BIT). This means that the # byte sequence will match how newlines are likely encoded in the # file, but we don't have to change the encoding of `data` as far as # Ruby is concerned. This allows us to correctly parse out each line # without changing the encoding of `data`, and # also--importantly--without having to duplicate many (potentially # large) strings. begin encoded_newlines = ["\r\n", "\r", "\n"]. map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) } data.split(Regexp.union(encoded_newlines), -1) rescue Encoding::ConverterNotFoundError # The data is not splittable in the detected encoding. Assume it's # one big line. [data] end else [] end end
Public: Get number of lines of code
Requires Linguist::Blob#data
Returns Integer
# File lib/linguist/blob_helper.rb, line 297 def loc lines.size end
Public: Get the actual blob mime type
Examples
# => 'text/plain' # => 'text/html'
Returns a mime type String.
# File lib/linguist/blob_helper.rb, line 53 def mime_type _mime_type ? _mime_type.to_s : 'text/plain' end
Public: Is the blob a PDF?
Return true or false
# File lib/linguist/blob_helper.rb, line 187 def pdf? extname.downcase == '.pdf' end
# File lib/linguist/blob_helper.rb, line 112 def ruby_encoding if hash = detect_encoding hash[:ruby_encoding] end end
Public: Is the blob safe to colorize?
Return true or false
# File lib/linguist/blob_helper.rb, line 203 def safe_to_colorize? !large? && text? && !high_ratio_of_long_lines? end
Public: Get number of source lines of code
Requires Linguist::Blob#data
Returns Integer
# File lib/linguist/blob_helper.rb, line 306 def sloc lines.grep(/\S/).size end
Public: Is the blob a supported 3D model format?
Return true or false
# File lib/linguist/blob_helper.rb, line 173 def solid? extname.downcase == '.stl' end
Public: Is the blob text?
Return true or false
# File lib/linguist/blob_helper.rb, line 159 def text? !binary? end
Internal: Get the TextMate compatible scope for the blob
# File lib/linguist/blob_helper.rb, line 332 def tm_scope language && language.tm_scope end
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
# File lib/linguist/blob_helper.rb, line 235 def vendored? path =~ VendoredRegexp ? true : false end
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
# File lib/linguist/blob_helper.rb, line 220 def viewable? !large? && text? end