diff --git a/README.md b/README.md index 240efd1..d9e0089 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,16 @@ # Shrimp [![Build Status](https://travis-ci.org/adjust/shrimp.png?branch=master)](https://travis-ci.org/adjust/shrimp) -Creates PDFs from URLs using phantomjs +Creates PDFs from web pages using PhantomJS -Read our [blogpost](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works. +Read our [blog post](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works. ## Installation Add this line to your application's Gemfile: - gem 'shrimp' +```ruby +gem 'shrimp' +``` And then execute: @@ -18,14 +20,13 @@ Or install it yourself as: $ gem install shrimp +### PhantomJS -### Phantomjs - - See http://phantomjs.org/download.html on how to install phantomjs +See http://phantomjs.org/download.html for instructions on how to install PhantomJS. ## Usage -``` +```ruby require 'shrimp' url = 'http://www.google.com' options = { :margin => "1cm"} @@ -33,56 +34,79 @@ Shrimp::Phantom.new(url, options).to_pdf("~/output.pdf") ``` ## Configuration -``` +Here is a list of configuration options that you can set. Unless otherwise noted in comments, the +value shown is the default value. + +Many of these options correspond to a property of the [WebPage module] +(https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) in PhantomJS. Refer to that +[documentation](https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) for more information +about what those options do. + +```ruby Shrimp.configure do |config| - # The path to the phantomjs executable - # defaults to `where phantomjs` - # config.phantomjs = '/usr/local/bin/phantomjs' + # The path to the phantomjs executable. Defaults to the path returned by `which phantomjs`. + config.phantomjs = '/usr/local/bin/phantomjs' - # the default pdf output format - # e.g. "5in*7.5in", "10cm*20cm", "A4", "Letter" - # config.format = 'A4' + # The paper size/format to use for the generated PDF file. Examples: "5in*7.5in", "10cm*20cm", + # "A4", "Letter". (See https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#papersize-object + # for a list of valid options.) + config.format = 'A4' - # the default margin - # config.margin = '1cm' + # The page margin to use (part of paperSize in PhantomJS) + config.margin = '1cm' - # the zoom factor - # config.zoom = 1 + # The zoom factor (zoomFactor in PhantomJS) + config.zoom = 1 - # the page orientation 'portrait' or 'landscape' - # config.orientation = 'portrait' + # The page orientation ('portrait' or 'landscape') (part of paperSize in PhantomJS) + config.orientation = 'portrait' - # a temporary dir used to store tempfiles - # config.tmpdir = Dir.tmpdir + # The directory where temporary files are stored, including the generated PDF files. + config.tmpdir = Dir.mktmpdir('shrimp'), - # the default rendering time in ms - # increase if you need to render very complex pages - # config.rendering_time = 1000 + # How long to wait (in ms) for PhantomJS to load the web page before saving it to a file. + # Increase this if you need to render very complex pages. + config.rendering_time = 1_000 - # change the viewport size. If you rendering pages that have - # flexible page width and height then you may need to set this - # to enforce a specific size - # config.viewport_width = 600 - # config.viewport_height = 600 + # The timeout for the phantomjs rendering process (in ms). This needs always to be higher than + # rendering_time. If this timeout expires before the job completes, it will cause PhantomJS to + # abort and exit with an error. + config.rendering_timeout = 90_000 - # the timeout for the phantomjs rendering process in ms - # this needs always to be higher than rendering_time - # config.rendering_timeout = 90000 + # Change the viewport size. If you are rendering a page that adapts its layout based on the + # page width and height then you may need to set this to enforce a specific size. (viewportSize + # in PhantomJS) + config.viewport_width = 600 + config.viewport_height = 600 - # maximum number of redirects to follow - # by default Shrimp does not follow any redirects which means that - # if the server responds with non HTTP 200 an error will be returned + # Maximum number of redirects to follow + # By default Shrimp does not follow any redirects, which means that if the server responds with + # something other than HTTP 200 (for example, 302), an error will be returned. Setting this > 0 + # causes it to follow that many redirects and only raise an error if the number of redirects exceeds + # this. # config.max_redirect_count = 0 - # the path to a json configuration file for command-line options - # config.command_config_file = "#{Rails.root.join('config', 'shrimp', 'config.json')}" + # The path to a json configuration file containing command-line options to be used by PhantomJS. + # Refer to https://github.com/ariya/phantomjs/wiki/API-Reference for a list of valid options. + # The default options are listed in the Readme. To use your own file from + # config/shrimp/config.json in Rails app, you could do this: + config.command_config_file = Rails.root.join('config/shrimp/config.json') + + # Enable if you want to see details such as the phantomjs command line that it's about to execute. + config.debug = false end ``` -### Command Configuration +### Default PhantomJS Command-line Options -``` +These are the PhantomJS options that will be used by default unless you set the +`config.command_config_file` option. + +See the PhantomJS [API-Reference](https://github.com/ariya/phantomjs/wiki/API-Reference) for a +complete list of valid options. + +```js { "diskCache": false, "ignoreSslErrors": false, @@ -94,98 +118,159 @@ end ## Middleware -Shrimp comes with a middleware that allows users to get a PDF view of any page on your site by appending .pdf to the URL. +Shrimp comes with a middleware that allows users to generate a PDF file of any page on your site +simply by appending .pdf to the URL. + +For example, if your site is [example.com](http://example.com) and you go to +http://example.com/report.pdf, the middleware will detect that a PDF is being requested and will +automatically convert the web page at http://example.com/report into a PDF and send that PDF as the +response. + +If you only want to allow this for some pages but not all of them, see below for how to add +conditions. ### Middleware Setup **Non-Rails Rack apps** - # in config.ru - require 'shrimp' - use Shrimp::Middleware +```ruby +# in config.ru +require 'shrimp' +use Shrimp::Middleware +``` **Rails apps** - # in application.rb(Rails3) or environment.rb(Rails2) - require 'shrimp' - config.middleware.use Shrimp::Middleware +```ruby +# in application.rb or an initializer (Rails 3) or environment.rb (Rails 2) +require 'shrimp' +config.middleware.use Shrimp::Middleware +``` **With Shrimp options** - # options will be passed to Shrimp::Phantom.new - config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter' - -**With conditions to limit routes that can be generated in pdf** +```ruby +# Options will be passed to Shrimp::Phantom.new +config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter' +``` - # conditions can be regexps (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public] - config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]] +**With conditions to limit which paths can be requested in PDF format** - # conditions can be strings (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :only => '/public' - config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public'] +```ruby +# conditions can be regexps (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public] +config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]] - # conditions can be regexps (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]] +# conditions can be strings (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :only => '/public' +config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public'] - # conditions can be strings (either one or an array) - config.middleware.use Shrimp::Middleware, {}, :except => ['/secret'] +# conditions can be regexps (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]] +# conditions can be strings (either one or an array) +config.middleware.use Shrimp::Middleware, {}, :except => ['/secret'] +``` ### Polling -To avoid deadlocks Shrimp::Middleware renders the pdf in a separate process retuning a 503 Retry-After response Header. -you can setup the polling interval and the polling offset in seconds. +To avoid tying up the web server while waiting for the PDF to be rendered (which could create a +deadlock) Shrimp::Middleware starts PDF generation in the background in a separate thread and +returns a 503 (Service Unavailable) response immediately. + +It also adds a [Retry-After](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) response +header, which tells the user's browser that the requested PDF resource is not available yet, but +will be soon, and instructs the browser to try again after a few seconds. When the same page is +requested again in a few seconds, it will again return a 503 if the PDF is still in the process of +being generated. This process will repeat until eventually the rendering has completed, at which +point the middleware returns a 200 (OK) response with the PDF itself. + +You can adjust both the `polling_offset` (how long to wait before the first retry; default is 1 +second) and the `polling_interval` (how long in seconds to wait between retries; default is 1 +second). Example: - config.middleware.use Shrimp::Middleware, :polling_interval => 1, :polling_offset => 5 +```ruby + config.middleware.use Shrimp::Middleware, :polling_offset => 5, :polling_interval => 1 +``` ### Caching -To avoid rendering the page on each request you can setup some the cache ttl in seconds +To improve performance and avoid having to re-generate the PDF file each time you request a PDF +resource, the existing PDF (that was generated the *first* time a certain URL was requested) will be +reused and sent again immediately if it already exists (for the same requested URL) and was +generated within the TTL. + +The default TTL is 1 second, but can be overridden by passing a different `cache_ttl` (in seconds) +to the middleware: +```ruby config.middleware.use Shrimp::Middleware, :cache_ttl => 3600, :out_path => "my/pdf/store" +``` +To disable this caching entirely and force it to re-generate the PDF again each time a request comes +in, set `cache_ttl` to 0. + +### Header/Footer + +You can specify a header or footer callback, which can even include page numbers. Example: + +```html + + + +``` ### Ajax requests -To include some fancy Ajax stuff with jquery +Here's an example of how to initiate an Ajax request for a PDF resource (using jQuery) and keep +polling the server until it either finishes successfully or returns with a 504 error code. ```js - - var url = '/my_page.pdf' - var statusCodes = { - 200: function() { - return window.location.assign(url); - }, - 504: function() { - console.log("Shit's being wired") - }, - 503: function(jqXHR, textStatus, errorThrown) { - var wait; - wait = parseInt(jqXHR.getResponseHeader('Retry-After')); - return setTimeout(function() { - return $.ajax({ - url: url, - statusCode: statusCodes - }); - }, wait * 1000); - } + var url = '/my_page.pdf' + var statusCodes = { + 200: function() { + return window.location.assign(url); + }, + 504: function() { + console.log("Sorry, the request timed out.") + }, + 503: function(jqXHR, textStatus, errorThrown) { + var wait; + wait = parseInt(jqXHR.getResponseHeader('Retry-After')); + return setTimeout(function() { + return $.ajax({ + url: url, + statusCode: statusCodes + }); + }, wait * 1000); + } } $.ajax({ url: url, statusCode: statusCodes }) - ``` ## Contributing -1. Fork it +1. Fork this repository 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin my-new-feature`) -5. Create new Pull Request +5. Create a pull request (`git pull-request` if you've installed [hub](https://github.com/github/hub)) ## Copyright -Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed under the terms -specified in the LICENSE file. + +Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed +under the terms specified in the LICENSE file. diff --git a/lib/shrimp.rb b/lib/shrimp.rb index f12222d..3f92c44 100644 --- a/lib/shrimp.rb +++ b/lib/shrimp.rb @@ -2,4 +2,5 @@ require 'shrimp/source' require 'shrimp/phantom' require 'shrimp/middleware' +require 'shrimp/synchronous_middleware' require 'shrimp/configuration' diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb new file mode 100644 index 0000000..cd45387 --- /dev/null +++ b/lib/shrimp/base_middleware.rb @@ -0,0 +1,131 @@ +module Shrimp + class BaseMiddleware + def initialize(app, options = { }, conditions = { }) + @app = app + @options = Shrimp.config.to_h.merge(options) + @conditions = conditions + end + + def render_as_pdf? + request_path_is_pdf = !!@request.path.match(%r{\.pdf$}) + + if request_path_is_pdf && @conditions[:only] + rules = [@conditions[:only]].flatten + rules.any? do |pattern| + if pattern.is_a?(Regexp) + @request.path =~ pattern + else + @request.path[0, pattern.length] == pattern + end + end + elsif request_path_is_pdf && @conditions[:except] + rules = [@conditions[:except]].flatten + rules.map do |pattern| + if pattern.is_a?(Regexp) + return false if @request.path =~ pattern + else + return false if @request.path[0, pattern.length] == pattern + end + end + return true + else + request_path_is_pdf + end + end + + def call(env) + if @options[:thread_safe] + dup._call(env) + else + _call(env) + end + end + + def _call(env) + @request = Rack::Request.new(env) + if render_as_pdf? + render_as_pdf(env) + else + @app.call(env) + end + end + + def render_to + file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" + file_path = @options[:out_path] + "#{file_path}/#{file_name}" + end + + def render_to_done + "#{render_to}.done" + end + + # The URL for the HTML-formatted web page that we are converting into a PDF. + def html_url + @request.url.sub(%r<\.pdf(\?|$)>, '\1') + end + + private + + def render_pdf + log_render_pdf_start + Phantom.new(html_url, @options, @request.cookies).tap do |phantom| + @phantom = phantom + phantom.to_pdf(render_to) + log_render_pdf_completion + File.open(render_to_done, 'w') { |f| f.write('done') } unless @phantom.error? + end + end + + def log_render_pdf_start + return unless @options[:debug] + puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...) + end + + def log_render_pdf_completion + return unless @options[:debug] + puts "#{self.class}: Finished converting web page at #{(html_url).inspect} into a PDF" + if @phantom.error? + puts "#{self.class}: Error: #{@phantom.error}" + else + puts "#{self.class}: Saved PDF to #{render_to}" + end + end + + def pdf_body + file = File.open(render_to, "rb") + body = file.read + file.close + body + end + + def default_pdf_options + { + :type => 'application/octet-stream'.freeze, + :disposition => 'attachment'.freeze, + } + end + + def pdf_headers(body, options = {}) + { }.tap do |headers| + headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s + headers["Content-Type"] = "application/pdf" + + # Based on send_file_headers! from actionpack/lib/action_controller/metal/data_streaming.rb + options = default_pdf_options.merge(@options).merge(options) + [:type, :disposition].each do |arg| + raise ArgumentError, ":#{arg} option required" if options[arg].nil? + end + + disposition = options[:disposition] + disposition += %(; filename="#{options[:filename]}") if options[:filename] + + headers.merge!( + 'Content-Disposition' => disposition, + 'Content-Transfer-Encoding' => 'binary' + ) + end + end + + end +end diff --git a/lib/shrimp/configuration.rb b/lib/shrimp/configuration.rb index eacd238..0c47d8d 100644 --- a/lib/shrimp/configuration.rb +++ b/lib/shrimp/configuration.rb @@ -2,38 +2,53 @@ module Shrimp class Configuration - attr_accessor :default_options - attr_writer :phantomjs + def initialize + @options = { + :format => 'A4', + :margin => '1cm', + :zoom => 1, + :orientation => 'portrait', + :tmpdir => Dir.mktmpdir('shrimp'), + :rendering_timeout => 90000, + :rendering_time => 1000, + :command_config_file => File.expand_path('../config.json', __FILE__), + :viewport_width => 600, + :viewport_height => 600, + :debug => false, + :thread_safe => true, + :max_redirect_count => 0 + } + end - [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :max_redirect_count].each do |m| + def to_h + @options + end + + [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :debug, :thread_safe, :max_redirect_count].each do |m| define_method("#{m}=") do |val| - @default_options[m]=val + @options[m] = val end - end - def initialize - @default_options = { - :format => 'A4', - :margin => '1cm', - :zoom => 1, - :orientation => 'portrait', - :tmpdir => Dir.tmpdir, - :rendering_timeout => 90000, - :rendering_time => 1000, - :command_config_file => File.expand_path('../config.json', __FILE__), - :viewport_width => 600, - :viewport_height => 600, - :max_redirect_count => 0 - } + define_method("#{m}") do + @options[m] + end end def phantomjs @phantomjs ||= (defined?(Bundler::GemfileError) ? `bundle exec which phantomjs` : `which phantomjs`).chomp end + attr_writer :phantomjs end class << self - attr_accessor :configuration + def configuration + @configuration ||= Configuration.new + end + alias_method :config, :configuration + + def configure + yield(configuration) + end end # Configure Phantomjs someplace sensible, @@ -45,11 +60,4 @@ class << self # config.format = 'Letter' # end - def self.configuration - @configuration ||= Configuration.new - end - - def self.configure - yield(configuration) - end end diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb index 96e8050..66fad5d 100644 --- a/lib/shrimp/middleware.rb +++ b/lib/shrimp/middleware.rb @@ -1,76 +1,65 @@ +require 'shrimp/base_middleware' + module Shrimp - class Middleware + class Middleware < BaseMiddleware def initialize(app, options = { }, conditions = { }) - @app = app - @options = options - @conditions = conditions + super @options[:polling_interval] ||= 1 @options[:polling_offset] ||= 1 @options[:cache_ttl] ||= 1 @options[:request_timeout] ||= @options[:polling_interval] * 10 end - def call(env) - @request = Rack::Request.new(env) - if render_as_pdf? #&& headers['Content-Type'] =~ /text\/html|application\/xhtml\+xml/ - if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0) - if File.size(render_to) == 0 - File.delete(render_to) - remove_rendering_flag - return error_response - end - return ready_response if env['HTTP_X_REQUESTED_WITH'] - file = File.open(render_to, "rb") - body = file.read - file.close - File.delete(render_to) if @options[:cache_ttl] == 0 + def render_as_pdf(env) + if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0) + if File.size(render_to) == 0 + delete_tmp_files remove_rendering_flag - response = [body] - headers = { } - headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s - headers["Content-Type"] = "application/pdf" - [200, headers, response] - else - if rendering_in_progress? - if rendering_timed_out? - remove_rendering_flag - error_response - else - reload_response(@options[:polling_interval]) - end + return error_response + end + return ready_response if env['HTTP_X_REQUESTED_WITH'] + body = pdf_body() + delete_tmp_files if @options[:cache_ttl] == 0 + remove_rendering_flag + headers = pdf_headers(body) + [200, headers, [body]] + else + if rendering_in_progress? + if rendering_timed_out? + remove_rendering_flag + error_response else - File.delete(render_to) if already_rendered? - set_rendering_flag - fire_phantom - reload_response(@options[:polling_offset]) + reload_response(@options[:polling_interval]) end + else + delete_tmp_files if already_rendered? + set_rendering_flag + # Start PhantomJS rendering in a separate thread and then immediately render a web page + # that continuously reloads (polls) until the rendering is complete. + # Using Thread.new instead of Process::detach fork because Process fork will cause + # database disconnection when the forked process ended + Thread.new { + render_pdf + } + reload_response(@options[:polling_offset]) end - else - @app.call(env) end end private - # Private: start phantom rendering in a separate process - def fire_phantom - Process::detach fork { Phantom.new(@request.url.sub(%r{\.pdf$}, ''), @options, @request.cookies).to_pdf(render_to) } - end - - def render_to - file_name = Digest::MD5.hexdigest(@request.path) + ".pdf" - file_path = @options[:out_path] - "#{file_path}/#{file_name}" - end - def already_rendered? - File.exists?(render_to) + File.exists?(render_to_done) && File.exists?(render_to) end def up_to_date?(ttl = 30) (Time.now - File.new(render_to).mtime) <= ttl end + def delete_tmp_files + File.delete(render_to) + File.delete(render_to_done) + end def remove_rendering_flag @request.session["phantom-rendering"] ||={ } @@ -82,45 +71,19 @@ def set_rendering_flag @request.session["phantom-rendering"][render_to] = Time.now end - def rendering_timed_out? - Time.now - @request.session["phantom-rendering"][render_to] > @options[:request_timeout] + def rendering_started_at + @request.session["phantom-rendering"][render_to].to_time end - def rendering_in_progress? - @request.session["phantom-rendering"]||={ } - @request.session["phantom-rendering"][render_to] + def rendering_timed_out? + Time.now - rendering_started_at > @options[:request_timeout] end - def render_as_pdf? - request_path_is_pdf = !!@request.path.match(%r{\.pdf$}) - - if request_path_is_pdf && @conditions[:only] - rules = [@conditions[:only]].flatten - rules.any? do |pattern| - if pattern.is_a?(Regexp) - @request.path =~ pattern - else - @request.path[0, pattern.length] == pattern - end - end - elsif request_path_is_pdf && @conditions[:except] - rules = [@conditions[:except]].flatten - rules.map do |pattern| - if pattern.is_a?(Regexp) - return false if @request.path =~ pattern - else - return false if @request.path[0, pattern.length] == pattern - end - end - return true - else - request_path_is_pdf - end + def rendering_in_progress? + @request.session["phantom-rendering"] ||={ } + !!@request.session["phantom-rendering"][render_to] end - def concat(accepts, type) - (accepts || '').split(',').unshift(type).compact.join(',') - end def reload_response(interval=1) body = <<-HTML.gsub(/[ \n]+/, ' ').strip @@ -128,7 +91,7 @@ def reload_response(interval=1) -

Preparing pdf...

+

Preparing PDF file. Please wait...

HTML @@ -146,7 +109,7 @@ def ready_response - PDF ready here + PDF file ready here HTML @@ -162,7 +125,7 @@ def error_response -

Sorry request timed out...

+

Sorry, the request timed out.

HTML diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb index 9b689ff..43e0c3e 100644 --- a/lib/shrimp/phantom.rb +++ b/lib/shrimp/phantom.rb @@ -5,7 +5,7 @@ module Shrimp class NoExecutableError < StandardError def initialize - msg = "No phantomjs executable found at #{Shrimp.configuration.phantomjs}\n" + msg = "No phantomjs executable found at #{Shrimp.config.phantomjs}\n" msg << ">> Please install phantomjs - http://phantomjs.org/download.html" super(msg) end @@ -25,35 +25,75 @@ def initialize(msg = nil) class Phantom attr_accessor :source, :configuration, :outfile - attr_reader :options, :cookies, :result, :error + attr_reader :options, :cookies, :result, :error, :response, :response_headers SCRIPT_FILE = File.expand_path('../rasterize.js', __FILE__) # Public: Runs the phantomjs binary # - # Returns the stdout output of phantomjs + # Returns the stdout output from phantomjs def run @error = nil + puts "Running command: #{cmd}" if options[:debug] @result = `#{cmd}` + if match = @result.match(response_line_regexp) + @response = JSON.parse match[1] + @response_headers = @response['headers'].inject({}) {|hash, header| + hash[header['name']] = header['value']; hash + } + @result.gsub! response_line_regexp, '' + end unless $?.exitstatus == 0 - @error = @result + @error = @result.chomp @result = nil end @result end def run! - @error = nil - @result = `#{cmd}` - unless $?.exitstatus == 0 - @error = @result - @result = nil - raise RenderingError.new(@error) + run.tap { + raise RenderingError.new(error) if error? + } + end + + def response_line_regexp + /^response: (.*)$\n?/ + end + def redirect? + page_load_status_code == 302 + end + def redirect_to + return unless redirect? + response['redirectURL'] if response + end + + def error? + !!error + end + + def match_page_load_error + error.to_s.match /^.* \(HTTP (null|\S+)\).*/ + end + def page_load_error? + !!match_page_load_error + end + def page_load_status_code + if match = match_page_load_error + status_code = match[1].to_s + if status_code =~ /\A\d+\Z/ + status_code.to_i + else + status_code + end end - @result end - # Public: Returns the phantom rasterize command + # Public: Returns the arguments for the PhantomJS rasterize command as a shell-escaped string def cmd + Shellwords.join cmd_array + end + + # Public: Returns the arguments for the PhantomJS rasterize command as an array + def cmd_array cookie_file = dump_cookies format, zoom, margin, orientation = options[:format], options[:zoom], options[:margin], options[:orientation] rendering_time, timeout = options[:rendering_time], options[:rendering_timeout] @@ -65,7 +105,7 @@ def cmd Shrimp.configuration.phantomjs, command_config_file, SCRIPT_FILE, - @source.to_s.shellescape, + @source.to_s, @outfile, format, zoom, @@ -77,7 +117,7 @@ def cmd viewport_width, viewport_height, max_redirect_count - ].join(" ") + ].map(&:to_s) end # Public: initializes a new Phantom Object @@ -94,10 +134,10 @@ def cmd # Returns self def initialize(url_or_file, options = { }, cookies={ }, outfile = nil) @source = Source.new(url_or_file) - @options = Shrimp.configuration.default_options.merge(options) + @options = Shrimp.config.to_h.merge(options) @cookies = cookies @outfile = File.expand_path(outfile) if outfile - raise NoExecutableError.new unless File.exists?(Shrimp.configuration.phantomjs) + raise NoExecutableError.new unless File.exists?(Shrimp.config.phantomjs) end # Public: renders to pdf diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js index 731382e..0106731 100644 --- a/lib/shrimp/rasterize.js +++ b/lib/shrimp/rasterize.js @@ -26,7 +26,7 @@ function print_usage() { } window.setTimeout(function () { - error("Shit's being weird no result within: " + time_out + "ms"); + error("No result within " + time_out + "ms. Aborting PhantomJS."); }, time_out); function renderUrl(url, output, options) { @@ -44,27 +44,33 @@ function renderUrl(url, output, options) { // determine the statusCode page.onResourceReceived = function (resource) { if (resource.url == url) { + console.log('response: ' + JSON.stringify(resource)) statusCode = resource.status; } }; page.onResourceError = function (resourceError) { - error(resourceError.errorString + ' (URL: ' + resourceError.url + ')'); + // Log the error but allow normal "Unable to load the page." handling to occur too so that we + // can get and report the actual HTTP status code. (resourceError.errorCode just returns a code + // from http://doc.qt.io/qt-4.8/qnetworkreply.html#NetworkError-enum) + console.log(resourceError.errorString); }; - page.onNavigationRequested = function (redirect_url, type, willNavigate, main) { - if (main) { - if (redirect_url !== url) { - page.close(); - - if (redirects_num-- >= 0) { - renderUrl(redirect_url, output, options); - } else { - error(url + ' redirects to ' + redirect_url + ' after maximum number of redirects reached'); + if (redirects_num > 0) { + page.onNavigationRequested = function (redirect_url, type, willNavigate, main) { + if (main) { + if (redirect_url !== url) { + page.close(); + + if (redirects_num-- >= 0) { + renderUrl(redirect_url, output, options); + } else { + error(url + ' redirects to ' + redirect_url + ' after maximum number of redirects reached'); + } } } - } - }; + }; + } page.open(url, function (status) { if (status !== 'success' || (statusCode != 200 && statusCode != null)) { @@ -77,22 +83,33 @@ function renderUrl(url, output, options) { console.log(e); } - error('Unable to load the URL: ' + url + ' (HTTP ' + statusCode + ')'); + error('Unable to load the page. (HTTP ' + statusCode + ') (URL: ' + url + ')'); } else { + /* Check whether the loaded page overwrites the header/footer setting, + i.e. whether a PhantomJSPriting object exists. Use that then instead + of our defaults above. + See https://github.com/ariya/phantomjs/blob/master/examples/printheaderfooter.js#L66 + */ window.setTimeout(function () { - page.render(output + '_tmp.pdf'); - - if (fs.exists(output)) { - fs.remove(output); - } - - try { - fs.move(output + '_tmp.pdf', output); - } catch (e) { - error(e); + if (page.evaluate(function(){return typeof PhantomJSPrinting == "object";})) { + paperSize = page.paperSize; + paperSize.header.height = page.evaluate(function() { + return PhantomJSPrinting.header.height; + }); + paperSize.header.contents = phantom.callback(function(pageNum, numPages) { + return page.evaluate(function(pageNum, numPages){return PhantomJSPrinting.header.contents(pageNum, numPages);}, pageNum, numPages); + }); + paperSize.footer.height = page.evaluate(function() { + return PhantomJSPrinting.footer.height; + }); + paperSize.footer.contents = phantom.callback(function(pageNum, numPages) { + return page.evaluate(function(pageNum, numPages){return PhantomJSPrinting.footer.contents(pageNum, numPages);}, pageNum, numPages); + }); + page.paperSize = paperSize; } - console.log('Rendered to: ' + output, new Date().getTime()); - phantom.exit(0); + page.render(output); + console.log('rendered to: ' + output, new Date().getTime()); + phantom.exit(); }, render_time); } }); @@ -125,8 +142,10 @@ if (system.args.length < 3 || system.args.length > 13) { if (system.args.length > 3 && system.args[2].substr(-4) === ".pdf") { size = system.args[3].split('*'); - page_options.paperSize = size.length === 2 ? { width:size[0], height:size[1], margin:'0px' } - : { format:system.args[3], orientation:orientation, margin:margin }; + header = { height: '1cm', contents: phantom.callback(function(pageNum, numPages) { return ""; }) }; + footer = { height: '1cm', contents: phantom.callback(function(pageNum, numPages) { return ""; }) }; + page_options.paperSize = size.length === 2 ? { width:size[0], height:size[1], margin:'0px', header: header, footer: footer } + : { format:system.args[3], orientation:orientation, margin:margin, header: header, footer: footer }; } if (system.args.length > 4) { page_options.zoomFactor = system.args[4]; diff --git a/lib/shrimp/synchronous_middleware.rb b/lib/shrimp/synchronous_middleware.rb new file mode 100644 index 0000000..8ed7775 --- /dev/null +++ b/lib/shrimp/synchronous_middleware.rb @@ -0,0 +1,34 @@ +require 'shrimp/base_middleware' + +module Shrimp + class SynchronousMiddleware < BaseMiddleware + def render_as_pdf(env) + # Start PhantomJS rendering in the same process (synchronously) and wait until it completes. + render_pdf + return phantomjs_error_response if phantom.error? + + body = pdf_body() + headers = pdf_headers(body, { + disposition: @phantom.response_headers['X-Pdf-Disposition'], + filename: @phantom.response_headers['X-Pdf-Filename'] + }.reject {|k, v| v.nil? } + ) + [200, headers, [body]] + end + + attr_reader :phantom + + private + + def phantomjs_error_response + headers = {'Content-Type' => 'text/html'} + if phantom.page_load_error? + status_code = phantom.page_load_status_code + headers['Location'] = phantom.redirect_to if phantom.redirect? + else + status_code = 500 + end + [status_code, headers, [phantom.error]] + end + end +end diff --git a/shrimp.gemspec b/shrimp.gemspec index 755850d..cd24bb8 100644 --- a/shrimp.gemspec +++ b/shrimp.gemspec @@ -24,4 +24,6 @@ Gem::Specification.new do |gem| gem.add_development_dependency(%q, [">= 2.2.0"]) gem.add_development_dependency(%q, [">= 0.5.6"]) gem.add_development_dependency(%q, ["= 1.4.1"]) + gem.add_development_dependency(%q) + gem.add_development_dependency(%q) end diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb index 48e720f..2f4dfb8 100644 --- a/spec/shrimp/middleware_spec.rb +++ b/spec/shrimp/middleware_spec.rb @@ -1,34 +1,24 @@ require 'spec_helper' -def app; - Rack::Lint.new(@app) -end - -def options - { :margin => "1cm", :out_path => Dir.tmpdir, - :polling_offset => 10, :polling_interval => 1, :cache_ttl => 3600, - :request_timeout => 1 } -end - -def mock_app(options = { }, conditions = { }) - main_app = lambda { |env| - headers = { 'Content-Type' => "text/html" } - [200, headers, ['Hello world!']] - } - - @middleware = Shrimp::Middleware.new(main_app, options, conditions) - @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session') +shared_context Shrimp::Middleware do + def mock_app(options = { }, conditions = { }) + @middleware = Shrimp::Middleware.new(main_app, options, conditions) + @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session') + end end - describe Shrimp::Middleware do - before { mock_app(options) } + include_context Shrimp::Middleware + + before { mock_app(middleware_options) } + subject { @middleware } context "matching pdf" do it "should render as pdf" do get '/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end + it "should return 503 the first time" do get '/test.pdf' last_response.status.should eq 503 @@ -42,9 +32,9 @@ def mock_app(options = { }, conditions = { }) last_response.header["Retry-After"].should eq "1" end - it "should set render to to outpath" do + it "should set render_to to out_path" do get '/test.pdf' - @middleware.send(:render_to).should match (Regexp.new("^#{options[:out_path]}")) + @middleware.send(:render_to).should start_with middleware_options[:out_path] end it "should return 504 on timeout" do @@ -62,62 +52,83 @@ def mock_app(options = { }, conditions = { }) last_response.status.should eq 503 end - it "should return a pdf with 200 after rendering" do - mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now) - File.should_receive(:'exists?').and_return true - File.should_receive(:'size').and_return 1000 - File.should_receive(:'open').and_return mock_file - File.should_receive(:'new').and_return mock_file - get '/test.pdf' - last_response.status.should eq 200 - last_response.body.should eq "Hello World" + describe "when already_rendered? and up_to_date?" do + before { + mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now) + File.should_receive(:exists?).at_least(:once).and_return true + File.should_receive(:size).and_return 1000 + File.should_receive(:open).and_return mock_file + File.should_receive(:new).at_least(:once).and_return mock_file + get '/test.pdf' + } + + its(:rendering_in_progress?) { should eq false } + its(:already_rendered?) { should eq true } + its(:up_to_date?) { should eq true } + + it "should return a pdf with 200" do + last_response.status.should eq 200 + last_response.headers['Content-Type'].should eq 'application/pdf' + last_response.body.should eq "Hello World" + end end + describe "requesting a simple path" do + before { get '/test.pdf' } + its(:html_url) { should eq 'http://example.org/test' } + end + describe "requesting a path with a query string" do + before { get '/test.pdf?size=10' } + its(:html_url) { should eq 'http://example.org/test?size=10' } + end end + context "not matching pdf" do it "should skip pdf rendering" do get 'http://www.example.org/test' last_response.body.should include "Hello world!" - @middleware.send(:'render_as_pdf?').should be false + @middleware.send(:render_as_pdf?).should be false end end end -describe "Conditions" do +describe Shrimp::Middleware, "Conditions" do + include_context Shrimp::Middleware + context "only" do - before { mock_app(options, :only => [%r[^/invoice], %r[^/public]]) } + before { mock_app(middleware_options, :only => [%r[^/invoice], %r[^/public]]) } it "render pdf for set only option" do get '/invoice/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "render pdf for set only option" do get '/public/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "not render pdf for any other path" do get '/secret/test.pdf' - @middleware.send(:'render_as_pdf?').should be false + @middleware.send(:render_as_pdf?).should be false end end context "except" do - before { mock_app(options, :except => %w(/secret)) } + before { mock_app(middleware_options, :except => %w(/secret)) } it "render pdf for set only option" do get '/invoice/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "render pdf for set only option" do get '/public/test.pdf' - @middleware.send(:'render_as_pdf?').should be true + @middleware.send(:render_as_pdf?).should be true end it "not render pdf for any other path" do get '/secret/test.pdf' - @middleware.send(:'render_as_pdf?').should be false + @middleware.send(:render_as_pdf?).should be false end end end diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb index 9d4978b..2b91498 100644 --- a/spec/shrimp/phantom_spec.rb +++ b/spec/shrimp/phantom_spec.rb @@ -11,44 +11,25 @@ def valid_pdf(io) end describe Shrimp::Phantom do - let(:testfile) { File.expand_path('../test_file.html', __FILE__) } - before do Shrimp.configure { |config| config.rendering_time = 1000 } end - # describe ".quote_arg" do - # subject { described_class } - - # let(:arg) { "test" } - - # it "wraps the argument with single quotes" do - # subject.quote_arg(arg).should eq "'test'" - # end - - # context "when the argument contains single quotes" do - # let(:arg) { "'te''st'" } - - # it "escapes them" do - # %x(echo #{subject.quote_arg(arg)}).strip.should eq arg - # end - # end - # end - it "should initialize attributes" do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") - phantom.source.to_s.should eq "file://#{testfile}" + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{tmpdir}/test.pdf") + phantom.source.to_s.should eq "file://#{test_file}" phantom.options[:margin].should eq "2cm" - phantom.outfile.should eq "#{Dir.tmpdir}/test.pdf" + phantom.outfile.should eq "#{tmpdir}/test.pdf" end it "should render a pdf file" do - #phantom = Shrimp::Phantom.new("file://#{@path}") - #phantom.to_pdf("#{Dir.tmpdir}/test.pdf").first should eq "#{Dir.tmpdir}/test.pdf" + phantom = Shrimp::Phantom.new("file://#{test_file}") + phantom.to_pdf("#{tmpdir}/test.pdf").should eq "#{tmpdir}/test.pdf" + phantom.result.should include "rendered to: #{tmpdir}/test.pdf" end it "should accept a local file url" do - phantom = Shrimp::Phantom.new("file://#{testfile}") + phantom = Shrimp::Phantom.new("file://#{test_file}") phantom.source.should be_url end @@ -57,27 +38,39 @@ def valid_pdf(io) phantom.source.should be_url end - it "should parse options into a cmd line" do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm", :max_redirect_count => 10 }, { }, "#{Dir.tmpdir}/test.pdf") - phantom.cmd.should include "test.pdf A4 1 2cm portrait" - phantom.cmd.should include "file://#{testfile}" - phantom.cmd.should include "lib/shrimp/rasterize.js" - phantom.cmd.should end_with " 10" - end + describe '#cmd' do + it "should generate the correct cmd" do + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm", :max_redirect_count => 10 }, { }, "#{tmpdir}/test.pdf") + phantom.cmd.should include "test.pdf A4 1 2cm portrait" + phantom.cmd.should include "file://#{test_file}" + phantom.cmd.should include "lib/shrimp/rasterize.js" + phantom.cmd.should end_with " 10" + end + + it "should escape arguments" do + phantom = Shrimp::Phantom.new("http://example.com/?something") + phantom.cmd_array.should include "http://example.com/?something" + phantom.cmd. should include "http://example.com/\\?something" - it "should properly escape arguments" do - malicious_uri = "file:///hello';shutdown" - bogus_phantom = Shrimp::Phantom.new(malicious_uri) + phantom = Shrimp::Phantom.new("http://example.com/path/file.html?width=100&height=100") + phantom.cmd_array.should include "http://example.com/path/file.html?width=100&height=100" + phantom.cmd. should include "http://example.com/path/file.html\\?width\\=100\\&height\\=100" + end + + it "should properly escape arguments" do + malicious_uri = "file:///hello';shutdown" + bogus_phantom = Shrimp::Phantom.new(malicious_uri) - bogus_phantom.cmd.should_not include malicious_uri + bogus_phantom.cmd.should_not include malicious_uri - Shrimp.configuration.stub(:phantomjs).and_return "echo" - %x(#{bogus_phantom.cmd}).strip.should include malicious_uri + Shrimp.configuration.stub(:phantomjs).and_return "echo" + %x(#{bogus_phantom.cmd}).strip.should include malicious_uri + end end context "rendering to a file" do - before do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") + before(:all) do + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf") @result = phantom.to_file end @@ -86,30 +79,32 @@ def valid_pdf(io) end it "should be a valid pdf" do - valid_pdf(@result) + valid_pdf?(@result).should eq true + pdf_strings(@result).should eq "Hello\tWorld!" end end context "rendering to a pdf" do - before do - @phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }) - @result = @phantom.to_pdf("#{Dir.tmpdir}/test.pdf") + before(:all) do + @phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }) + @result = @phantom.to_pdf("#{tmpdir}/test.pdf") end it "should return a path to pdf" do @result.should be_a String - @result.should eq "#{Dir.tmpdir}/test.pdf" + @result.should eq "#{tmpdir}/test.pdf" end it "should be a valid pdf" do - valid_pdf(@result) + valid_pdf?(@result).should eq true + pdf_strings(Pathname(@result)).should eq "Hello\tWorld!" end end context "rendering to a String" do - before do - phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }) - @result = phantom.to_string("#{Dir.tmpdir}/test.pdf") + before(:all) do + phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }) + @result = phantom.to_string("#{tmpdir}/test.pdf") end it "should return the File IO String" do @@ -117,39 +112,74 @@ def valid_pdf(io) end it "should be a valid pdf" do - valid_pdf(@result) + valid_pdf?(@result).should eq true + pdf_strings(@result).should eq "Hello\tWorld!" end end - context "Error" do - it "should return result nil" do - phantom = Shrimp::Phantom.new("file://foo/bar") - @result = phantom.run - @result.should be_nil + context "Errors" do + describe "'Unable to load the address' error" do + before { @result = phantom.run } + + context 'an invalid http: address' do + subject(:phantom) { Shrimp::Phantom.new("http://example.com/foo/bar") } + it { @result.should be_nil } + its(:error) { should eq "Error downloading http://example.com/foo/bar - server replied: Not Found\nUnable to load the page. (HTTP 404) (URL: http://example.com/foo/bar)" } + its(:page_load_error?) { should eq true } + its(:page_load_status_code) { should eq 404 } + end + + context 'an http: response that redirects' do + around(:each) do |example| + with_local_server do |server| + server.mount_proc '/' do |request, response| + response.body = 'Home' + raise WEBrick::HTTPStatus::OK + end + server.mount_proc '/redirect_me' do |request, response| + response['Location'] = '/' + raise WEBrick::HTTPStatus::Found + end + example.run + end + end + subject(:phantom) { Shrimp::Phantom.new("http://#{local_server_host}/redirect_me") } + it { @result.should be_nil } + its(:error) { should eq "Unable to load the page. (HTTP 302) (URL: http://localhost:8800/redirect_me)" } + its(:page_load_error?) { should eq true } + its(:page_load_status_code) { should eq 302 } + its('response.keys') { should include 'redirectURL' } + its('response_headers.keys') { should == ['Location', 'Server', 'Date', 'Content-Length', 'Connection'] } + its(:redirect_to) { should eq "http://#{local_server_host}/" } + end + + context 'an invalid file: address' do + subject(:phantom) { Shrimp::Phantom.new("file:///foo/bar") } + it { @result.should be_nil } + its(:error) { should include "Error opening /foo/bar: No such file or directory\nUnable to load the page. (HTTP null) (URL: file:///foo/bar)" } + its(:page_load_error?) { should eq true } + its(:page_load_status_code) { should eq 'null' } + end end + end + context "Errors (using bang methods)" do it "should be unable to load the address" do phantom = Shrimp::Phantom.new("file:///foo/bar") - phantom.run - phantom.error.should include "Error opening /foo/bar: No such file or directory (URL: file:///foo/bar)" - end - - it "should be unable to copy file" do - phantom = Shrimp::Phantom.new("file://#{testfile}") - phantom.to_pdf("/foo/bar/") - phantom.error.should include "Unable to copy file " + expect { phantom.run! }.to raise_error Shrimp::RenderingError end end - context "Error Bang!" do - it "should be unable to load the address" do - phantom = Shrimp::Phantom.new("file:///foo/bar") - expect { phantom.run! }.to raise_error Shrimp::RenderingError + context 'test_file_with_page_numbers.html' do + let(:test_file) { super('test_file_with_page_numbers.html') } + + before do + phantom = Shrimp::Phantom.new("file://#{test_file}") + @result = phantom.to_string("#{tmpdir}/test.pdf") end - it "should be unable to copy file" do - phantom = Shrimp::Phantom.new("file://#{testfile}") - expect { phantom.to_pdf!("/foo/bar/") }.to raise_error Shrimp::RenderingError + it "PDF should contain page numbers" do + pdf_strings(@result).should eq "Header:\tPage\t1/2Footer:\tPage\t1/2Hello\tWorld!Hello\tWorld!Header:\tPage\t2/2Footer:\tPage\t2/2" end end end diff --git a/spec/shrimp/synchronous_middleware_spec.rb b/spec/shrimp/synchronous_middleware_spec.rb new file mode 100644 index 0000000..263c471 --- /dev/null +++ b/spec/shrimp/synchronous_middleware_spec.rb @@ -0,0 +1,131 @@ +require 'spec_helper' + +shared_context Shrimp::SynchronousMiddleware do + def mock_app(options = { }, conditions = { }) + @middleware = Shrimp::SynchronousMiddleware.new(main_app, options, conditions) + @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session') + end +end + +describe Shrimp::SynchronousMiddleware do + include_context Shrimp::SynchronousMiddleware + + before { mock_app(middleware_options) } + subject { @middleware } + + context "matching pdf" do + describe "requesting a simple path" do + before { get '/test.pdf' } + its(:html_url) { should eq 'http://example.org/test' } + its(:render_as_pdf?) { should be true } + it { @middleware.send(:render_to).should start_with middleware_options[:out_path] } + it "should return a 404 status because http://example.org/test does not exist" do + last_response.status.should eq 404 + message = "Error downloading http://example.org/test - server replied: Not Found\nUnable to load the page. (HTTP 404) (URL: http://example.org/test)" + last_response.body. should eq message + @middleware.phantom.error.should eq message + end + end + + describe "requesting a path with a query string" do + before { get '/test.pdf?size=10' } + its(:render_as_pdf?) { should be true } + its(:html_url) { should eq 'http://example.org/test?size=10' } + end + + describe "requesting a simple path (and we stub html_url to a file url)" do + before { @middleware.stub(:html_url).and_return "file://#{test_file}" } + before { get '/test.pdf' } + it "should return a valid pdf with 200 status" do + last_response.status.should eq 200 + last_response.headers['Content-Type'].should eq 'application/pdf' + valid_pdf?(last_response.body).should eq true + @middleware.phantom.result.should start_with "rendered to: #{@middleware.render_to}" + end + end + + context 'requesting an HTML resource that sets a X-Pdf-Filename header' do + before { + @middleware.stub(:html_url).and_return "file://#{test_file}" + phantom = Shrimp::Phantom.new(@middleware.html_url) + phantom.stub :response_headers => { + 'X-Pdf-Filename' => 'Some Fancy Report Title.pdf' + } + Shrimp::Phantom.should_receive(:new).and_return phantom + } + before { get '/use_different_filename.pdf' } + it "should use the filename from the X-Pdf-Filename header" do + last_response.status.should eq 200 + last_response.headers['Content-Type'].should eq 'application/pdf' + last_response.headers['Content-Disposition'].should eq %(attachment; filename="Some Fancy Report Title.pdf") + valid_pdf?(last_response.body).should eq true + end + end + + context 'requesting an HTML resource that redirects' do + before { + phantom = Shrimp::Phantom.new('http://example.org/redirect_me') + phantom.should_receive(:to_pdf).and_return nil + phantom.stub :error => "Unable to load the page. (HTTP 302) (URL: http://example.org/redirect_me)", + :redirect_to => "http://example.org/sign_in" + Shrimp::Phantom.should_receive(:new).and_return phantom + } + before { get '/redirect_me.pdf' } + it "should follow the redirect that the phantomjs request encountered" do + # This tests the phantomjs_error_response method + last_response.status.should eq 302 + last_response.headers['Content-Type'].should eq 'text/html' + last_response.headers['Location'].should eq "http://example.org/sign_in" + @middleware.phantom.error.should include "Unable to load the page" + end + end + end + + context "not matching pdf" do + it "should skip pdf rendering" do + get 'http://www.example.org/test' + last_response.body.should include "Hello world!" + @middleware.render_as_pdf?.should be false + end + end +end + +describe Shrimp::SynchronousMiddleware, "Conditions" do + include_context Shrimp::SynchronousMiddleware + + context "only" do + before { mock_app(middleware_options, :only => [%r[^/invoice], %r[^/public]]) } + it "render pdf for set only option" do + get '/invoice/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "render pdf for set only option" do + get '/public/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "not render pdf for any other path" do + get '/secret/test.pdf' + @middleware.render_as_pdf?.should be false + end + end + + context "except" do + before { mock_app(middleware_options, :except => %w(/secret)) } + it "render pdf for set only option" do + get '/invoice/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "render pdf for set only option" do + get '/public/test.pdf' + @middleware.render_as_pdf?.should be true + end + + it "not render pdf for any other path" do + get '/secret/test.pdf' + @middleware.render_as_pdf?.should be false + end + end +end diff --git a/spec/shrimp/test_file_with_page_numbers.html b/spec/shrimp/test_file_with_page_numbers.html new file mode 100644 index 0000000..77f995a --- /dev/null +++ b/spec/shrimp/test_file_with_page_numbers.html @@ -0,0 +1,21 @@ + + + + + +

Hello World!

+

Hello World!

+ + + diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 154a571..7bb6ac4 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -1,7 +1,87 @@ require 'rack/test' require 'shrimp' +require 'webrick' +require 'pdf/inspector' RSpec.configure do |config| include Rack::Test::Methods end +Shrimp.configure do |config| + # If we left this as the default value of true, then we couldn't check things like + # @middleware.render_as_pdf? in our tests after initiating a request with get '/test.pdf', because + # render_as_pdf? depends on @request, which doesn't get set until *after* we call call(env) with the + # request env. But when thread_safe is true, it actually prevents call(env) from changing any + # instance variables in the original object. (In the original object, @request will still be nil.) + config.thread_safe = false +end + +def tmpdir + Shrimp.config.tmpdir +end + +def test_file(file_name = 'test_file.html') + File.expand_path("../shrimp/#{file_name}", __FILE__) +end + +def valid_pdf?(io) + case io + when File + io.read[0...4] == "%PDF" + when String + io[0...4] == "%PDF" || File.open(io).read[0...4] == "%PDF" + end +end +def pdf_strings(pdf) + PDF::Inspector::Text.analyze(pdf).strings.join +end + +# Used by rack-test when we call get +def app + Rack::Lint.new(@app) +end + +def main_app + lambda { |env| + headers = { 'Content-Type' => "text/html" } + [200, headers, ['Hello world!']] + } +end + +def middleware_options + { + :margin => "1cm", + :out_path => tmpdir, + :polling_offset => 10, + :polling_interval => 1, + :cache_ttl => 3600, + :request_timeout => 1 + } +end + +def local_server_port + 8800 +end +def local_server_host + "localhost:#{local_server_port}" +end + +def with_local_server + webrick_options = { + :Port => local_server_port, + :AccessLog => [], + :Logger => WEBrick::Log::new(RUBY_PLATFORM =~ /mswin|mingw/ ? 'NUL:' : '/dev/null', 7) + } + begin + # The "TCPServer Error: Address already in use - bind(2)" warning here appears to be bogus, + # because it occurs even the first time we start the server and nothing else is bound to the + # port. + server = WEBrick::HTTPServer.new(webrick_options) + trap("INT") { server.shutdown } + Thread.new { server.start } + yield server + server.shutdown + ensure + server.shutdown if server + end +end