diff --git a/README.md b/README.md
index 240efd1..d9e0089 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,16 @@
# Shrimp
[](https://travis-ci.org/adjust/shrimp)
-Creates PDFs from URLs using phantomjs
+Creates PDFs from web pages using PhantomJS
-Read our [blogpost](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works.
+Read our [blog post](http://big-elephants.com/2012-12/pdf-rendering-with-phantomjs/) about how it works.
## Installation
Add this line to your application's Gemfile:
- gem 'shrimp'
+```ruby
+gem 'shrimp'
+```
And then execute:
@@ -18,14 +20,13 @@ Or install it yourself as:
$ gem install shrimp
+### PhantomJS
-### Phantomjs
-
- See http://phantomjs.org/download.html on how to install phantomjs
+See http://phantomjs.org/download.html for instructions on how to install PhantomJS.
## Usage
-```
+```ruby
require 'shrimp'
url = 'http://www.google.com'
options = { :margin => "1cm"}
@@ -33,56 +34,79 @@ Shrimp::Phantom.new(url, options).to_pdf("~/output.pdf")
```
## Configuration
-```
+Here is a list of configuration options that you can set. Unless otherwise noted in comments, the
+value shown is the default value.
+
+Many of these options correspond to a property of the [WebPage module]
+(https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) in PhantomJS. Refer to that
+[documentation](https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage) for more information
+about what those options do.
+
+```ruby
Shrimp.configure do |config|
- # The path to the phantomjs executable
- # defaults to `where phantomjs`
- # config.phantomjs = '/usr/local/bin/phantomjs'
+ # The path to the phantomjs executable. Defaults to the path returned by `which phantomjs`.
+ config.phantomjs = '/usr/local/bin/phantomjs'
- # the default pdf output format
- # e.g. "5in*7.5in", "10cm*20cm", "A4", "Letter"
- # config.format = 'A4'
+ # The paper size/format to use for the generated PDF file. Examples: "5in*7.5in", "10cm*20cm",
+ # "A4", "Letter". (See https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#papersize-object
+ # for a list of valid options.)
+ config.format = 'A4'
- # the default margin
- # config.margin = '1cm'
+ # The page margin to use (part of paperSize in PhantomJS)
+ config.margin = '1cm'
- # the zoom factor
- # config.zoom = 1
+ # The zoom factor (zoomFactor in PhantomJS)
+ config.zoom = 1
- # the page orientation 'portrait' or 'landscape'
- # config.orientation = 'portrait'
+ # The page orientation ('portrait' or 'landscape') (part of paperSize in PhantomJS)
+ config.orientation = 'portrait'
- # a temporary dir used to store tempfiles
- # config.tmpdir = Dir.tmpdir
+ # The directory where temporary files are stored, including the generated PDF files.
+ config.tmpdir = Dir.mktmpdir('shrimp'),
- # the default rendering time in ms
- # increase if you need to render very complex pages
- # config.rendering_time = 1000
+ # How long to wait (in ms) for PhantomJS to load the web page before saving it to a file.
+ # Increase this if you need to render very complex pages.
+ config.rendering_time = 1_000
- # change the viewport size. If you rendering pages that have
- # flexible page width and height then you may need to set this
- # to enforce a specific size
- # config.viewport_width = 600
- # config.viewport_height = 600
+ # The timeout for the phantomjs rendering process (in ms). This needs always to be higher than
+ # rendering_time. If this timeout expires before the job completes, it will cause PhantomJS to
+ # abort and exit with an error.
+ config.rendering_timeout = 90_000
- # the timeout for the phantomjs rendering process in ms
- # this needs always to be higher than rendering_time
- # config.rendering_timeout = 90000
+ # Change the viewport size. If you are rendering a page that adapts its layout based on the
+ # page width and height then you may need to set this to enforce a specific size. (viewportSize
+ # in PhantomJS)
+ config.viewport_width = 600
+ config.viewport_height = 600
- # maximum number of redirects to follow
- # by default Shrimp does not follow any redirects which means that
- # if the server responds with non HTTP 200 an error will be returned
+ # Maximum number of redirects to follow
+ # By default Shrimp does not follow any redirects, which means that if the server responds with
+ # something other than HTTP 200 (for example, 302), an error will be returned. Setting this > 0
+ # causes it to follow that many redirects and only raise an error if the number of redirects exceeds
+ # this.
# config.max_redirect_count = 0
- # the path to a json configuration file for command-line options
- # config.command_config_file = "#{Rails.root.join('config', 'shrimp', 'config.json')}"
+ # The path to a json configuration file containing command-line options to be used by PhantomJS.
+ # Refer to https://github.com/ariya/phantomjs/wiki/API-Reference for a list of valid options.
+ # The default options are listed in the Readme. To use your own file from
+ # config/shrimp/config.json in Rails app, you could do this:
+ config.command_config_file = Rails.root.join('config/shrimp/config.json')
+
+ # Enable if you want to see details such as the phantomjs command line that it's about to execute.
+ config.debug = false
end
```
-### Command Configuration
+### Default PhantomJS Command-line Options
-```
+These are the PhantomJS options that will be used by default unless you set the
+`config.command_config_file` option.
+
+See the PhantomJS [API-Reference](https://github.com/ariya/phantomjs/wiki/API-Reference) for a
+complete list of valid options.
+
+```js
{
"diskCache": false,
"ignoreSslErrors": false,
@@ -94,98 +118,159 @@ end
## Middleware
-Shrimp comes with a middleware that allows users to get a PDF view of any page on your site by appending .pdf to the URL.
+Shrimp comes with a middleware that allows users to generate a PDF file of any page on your site
+simply by appending .pdf to the URL.
+
+For example, if your site is [example.com](http://example.com) and you go to
+http://example.com/report.pdf, the middleware will detect that a PDF is being requested and will
+automatically convert the web page at http://example.com/report into a PDF and send that PDF as the
+response.
+
+If you only want to allow this for some pages but not all of them, see below for how to add
+conditions.
### Middleware Setup
**Non-Rails Rack apps**
- # in config.ru
- require 'shrimp'
- use Shrimp::Middleware
+```ruby
+# in config.ru
+require 'shrimp'
+use Shrimp::Middleware
+```
**Rails apps**
- # in application.rb(Rails3) or environment.rb(Rails2)
- require 'shrimp'
- config.middleware.use Shrimp::Middleware
+```ruby
+# in application.rb or an initializer (Rails 3) or environment.rb (Rails 2)
+require 'shrimp'
+config.middleware.use Shrimp::Middleware
+```
**With Shrimp options**
- # options will be passed to Shrimp::Phantom.new
- config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter'
-
-**With conditions to limit routes that can be generated in pdf**
+```ruby
+# Options will be passed to Shrimp::Phantom.new
+config.middleware.use Shrimp::Middleware, :margin => '0.5cm', :format => 'Letter'
+```
- # conditions can be regexps (either one or an array)
- config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public]
- config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]]
+**With conditions to limit which paths can be requested in PDF format**
- # conditions can be strings (either one or an array)
- config.middleware.use Shrimp::Middleware, {}, :only => '/public'
- config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public']
+```ruby
+# conditions can be regexps (either one or an array)
+config.middleware.use Shrimp::Middleware, {}, :only => %r[^/public]
+config.middleware.use Shrimp::Middleware, {}, :only => [%r[^/invoice], %r[^/public]]
- # conditions can be regexps (either one or an array)
- config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]]
+# conditions can be strings (either one or an array)
+config.middleware.use Shrimp::Middleware, {}, :only => '/public'
+config.middleware.use Shrimp::Middleware, {}, :only => ['/invoice', '/public']
- # conditions can be strings (either one or an array)
- config.middleware.use Shrimp::Middleware, {}, :except => ['/secret']
+# conditions can be regexps (either one or an array)
+config.middleware.use Shrimp::Middleware, {}, :except => [%r[^/prawn], %r[^/secret]]
+# conditions can be strings (either one or an array)
+config.middleware.use Shrimp::Middleware, {}, :except => ['/secret']
+```
### Polling
-To avoid deadlocks Shrimp::Middleware renders the pdf in a separate process retuning a 503 Retry-After response Header.
-you can setup the polling interval and the polling offset in seconds.
+To avoid tying up the web server while waiting for the PDF to be rendered (which could create a
+deadlock) Shrimp::Middleware starts PDF generation in the background in a separate thread and
+returns a 503 (Service Unavailable) response immediately.
+
+It also adds a [Retry-After](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) response
+header, which tells the user's browser that the requested PDF resource is not available yet, but
+will be soon, and instructs the browser to try again after a few seconds. When the same page is
+requested again in a few seconds, it will again return a 503 if the PDF is still in the process of
+being generated. This process will repeat until eventually the rendering has completed, at which
+point the middleware returns a 200 (OK) response with the PDF itself.
+
+You can adjust both the `polling_offset` (how long to wait before the first retry; default is 1
+second) and the `polling_interval` (how long in seconds to wait between retries; default is 1
+second). Example:
- config.middleware.use Shrimp::Middleware, :polling_interval => 1, :polling_offset => 5
+```ruby
+ config.middleware.use Shrimp::Middleware, :polling_offset => 5, :polling_interval => 1
+```
### Caching
-To avoid rendering the page on each request you can setup some the cache ttl in seconds
+To improve performance and avoid having to re-generate the PDF file each time you request a PDF
+resource, the existing PDF (that was generated the *first* time a certain URL was requested) will be
+reused and sent again immediately if it already exists (for the same requested URL) and was
+generated within the TTL.
+
+The default TTL is 1 second, but can be overridden by passing a different `cache_ttl` (in seconds)
+to the middleware:
+```ruby
config.middleware.use Shrimp::Middleware, :cache_ttl => 3600, :out_path => "my/pdf/store"
+```
+To disable this caching entirely and force it to re-generate the PDF again each time a request comes
+in, set `cache_ttl` to 0.
+
+### Header/Footer
+
+You can specify a header or footer callback, which can even include page numbers. Example:
+
+```html
+
+
+
+```
### Ajax requests
-To include some fancy Ajax stuff with jquery
+Here's an example of how to initiate an Ajax request for a PDF resource (using jQuery) and keep
+polling the server until it either finishes successfully or returns with a 504 error code.
```js
-
- var url = '/my_page.pdf'
- var statusCodes = {
- 200: function() {
- return window.location.assign(url);
- },
- 504: function() {
- console.log("Shit's being wired")
- },
- 503: function(jqXHR, textStatus, errorThrown) {
- var wait;
- wait = parseInt(jqXHR.getResponseHeader('Retry-After'));
- return setTimeout(function() {
- return $.ajax({
- url: url,
- statusCode: statusCodes
- });
- }, wait * 1000);
- }
+ var url = '/my_page.pdf'
+ var statusCodes = {
+ 200: function() {
+ return window.location.assign(url);
+ },
+ 504: function() {
+ console.log("Sorry, the request timed out.")
+ },
+ 503: function(jqXHR, textStatus, errorThrown) {
+ var wait;
+ wait = parseInt(jqXHR.getResponseHeader('Retry-After'));
+ return setTimeout(function() {
+ return $.ajax({
+ url: url,
+ statusCode: statusCodes
+ });
+ }, wait * 1000);
+ }
}
$.ajax({
url: url,
statusCode: statusCodes
})
-
```
## Contributing
-1. Fork it
+1. Fork this repository
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
-5. Create new Pull Request
+5. Create a pull request (`git pull-request` if you've installed [hub](https://github.com/github/hub))
## Copyright
-Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed under the terms
-specified in the LICENSE file.
+
+Shrimp is Copyright © 2012 adeven (Manuel Kniep). It is free software, and may be redistributed
+under the terms specified in the LICENSE file.
diff --git a/lib/shrimp.rb b/lib/shrimp.rb
index f12222d..3f92c44 100644
--- a/lib/shrimp.rb
+++ b/lib/shrimp.rb
@@ -2,4 +2,5 @@
require 'shrimp/source'
require 'shrimp/phantom'
require 'shrimp/middleware'
+require 'shrimp/synchronous_middleware'
require 'shrimp/configuration'
diff --git a/lib/shrimp/base_middleware.rb b/lib/shrimp/base_middleware.rb
new file mode 100644
index 0000000..cd45387
--- /dev/null
+++ b/lib/shrimp/base_middleware.rb
@@ -0,0 +1,131 @@
+module Shrimp
+ class BaseMiddleware
+ def initialize(app, options = { }, conditions = { })
+ @app = app
+ @options = Shrimp.config.to_h.merge(options)
+ @conditions = conditions
+ end
+
+ def render_as_pdf?
+ request_path_is_pdf = !!@request.path.match(%r{\.pdf$})
+
+ if request_path_is_pdf && @conditions[:only]
+ rules = [@conditions[:only]].flatten
+ rules.any? do |pattern|
+ if pattern.is_a?(Regexp)
+ @request.path =~ pattern
+ else
+ @request.path[0, pattern.length] == pattern
+ end
+ end
+ elsif request_path_is_pdf && @conditions[:except]
+ rules = [@conditions[:except]].flatten
+ rules.map do |pattern|
+ if pattern.is_a?(Regexp)
+ return false if @request.path =~ pattern
+ else
+ return false if @request.path[0, pattern.length] == pattern
+ end
+ end
+ return true
+ else
+ request_path_is_pdf
+ end
+ end
+
+ def call(env)
+ if @options[:thread_safe]
+ dup._call(env)
+ else
+ _call(env)
+ end
+ end
+
+ def _call(env)
+ @request = Rack::Request.new(env)
+ if render_as_pdf?
+ render_as_pdf(env)
+ else
+ @app.call(env)
+ end
+ end
+
+ def render_to
+ file_name = Digest::MD5.hexdigest(@request.path) + ".pdf"
+ file_path = @options[:out_path]
+ "#{file_path}/#{file_name}"
+ end
+
+ def render_to_done
+ "#{render_to}.done"
+ end
+
+ # The URL for the HTML-formatted web page that we are converting into a PDF.
+ def html_url
+ @request.url.sub(%r<\.pdf(\?|$)>, '\1')
+ end
+
+ private
+
+ def render_pdf
+ log_render_pdf_start
+ Phantom.new(html_url, @options, @request.cookies).tap do |phantom|
+ @phantom = phantom
+ phantom.to_pdf(render_to)
+ log_render_pdf_completion
+ File.open(render_to_done, 'w') { |f| f.write('done') } unless @phantom.error?
+ end
+ end
+
+ def log_render_pdf_start
+ return unless @options[:debug]
+ puts %(#{self.class}: Converting web page at #{(html_url).inspect} into a PDF ...)
+ end
+
+ def log_render_pdf_completion
+ return unless @options[:debug]
+ puts "#{self.class}: Finished converting web page at #{(html_url).inspect} into a PDF"
+ if @phantom.error?
+ puts "#{self.class}: Error: #{@phantom.error}"
+ else
+ puts "#{self.class}: Saved PDF to #{render_to}"
+ end
+ end
+
+ def pdf_body
+ file = File.open(render_to, "rb")
+ body = file.read
+ file.close
+ body
+ end
+
+ def default_pdf_options
+ {
+ :type => 'application/octet-stream'.freeze,
+ :disposition => 'attachment'.freeze,
+ }
+ end
+
+ def pdf_headers(body, options = {})
+ { }.tap do |headers|
+ headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s
+ headers["Content-Type"] = "application/pdf"
+
+ # Based on send_file_headers! from actionpack/lib/action_controller/metal/data_streaming.rb
+ options = default_pdf_options.merge(@options).merge(options)
+ [:type, :disposition].each do |arg|
+ raise ArgumentError, ":#{arg} option required" if options[arg].nil?
+ end
+
+ disposition = options[:disposition]
+ disposition += %(; filename="#{options[:filename]}") if options[:filename]
+
+ headers.merge!(
+ 'Content-Disposition' => disposition,
+ 'Content-Transfer-Encoding' => 'binary'
+ )
+ end
+ end
+
+ end
+end
diff --git a/lib/shrimp/configuration.rb b/lib/shrimp/configuration.rb
index eacd238..0c47d8d 100644
--- a/lib/shrimp/configuration.rb
+++ b/lib/shrimp/configuration.rb
@@ -2,38 +2,53 @@
module Shrimp
class Configuration
- attr_accessor :default_options
- attr_writer :phantomjs
+ def initialize
+ @options = {
+ :format => 'A4',
+ :margin => '1cm',
+ :zoom => 1,
+ :orientation => 'portrait',
+ :tmpdir => Dir.mktmpdir('shrimp'),
+ :rendering_timeout => 90000,
+ :rendering_time => 1000,
+ :command_config_file => File.expand_path('../config.json', __FILE__),
+ :viewport_width => 600,
+ :viewport_height => 600,
+ :debug => false,
+ :thread_safe => true,
+ :max_redirect_count => 0
+ }
+ end
- [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :max_redirect_count].each do |m|
+ def to_h
+ @options
+ end
+
+ [:format, :margin, :zoom, :orientation, :tmpdir, :rendering_timeout, :rendering_time, :command_config_file, :viewport_width, :viewport_height, :debug, :thread_safe, :max_redirect_count].each do |m|
define_method("#{m}=") do |val|
- @default_options[m]=val
+ @options[m] = val
end
- end
- def initialize
- @default_options = {
- :format => 'A4',
- :margin => '1cm',
- :zoom => 1,
- :orientation => 'portrait',
- :tmpdir => Dir.tmpdir,
- :rendering_timeout => 90000,
- :rendering_time => 1000,
- :command_config_file => File.expand_path('../config.json', __FILE__),
- :viewport_width => 600,
- :viewport_height => 600,
- :max_redirect_count => 0
- }
+ define_method("#{m}") do
+ @options[m]
+ end
end
def phantomjs
@phantomjs ||= (defined?(Bundler::GemfileError) ? `bundle exec which phantomjs` : `which phantomjs`).chomp
end
+ attr_writer :phantomjs
end
class << self
- attr_accessor :configuration
+ def configuration
+ @configuration ||= Configuration.new
+ end
+ alias_method :config, :configuration
+
+ def configure
+ yield(configuration)
+ end
end
# Configure Phantomjs someplace sensible,
@@ -45,11 +60,4 @@ class << self
# config.format = 'Letter'
# end
- def self.configuration
- @configuration ||= Configuration.new
- end
-
- def self.configure
- yield(configuration)
- end
end
diff --git a/lib/shrimp/middleware.rb b/lib/shrimp/middleware.rb
index 96e8050..66fad5d 100644
--- a/lib/shrimp/middleware.rb
+++ b/lib/shrimp/middleware.rb
@@ -1,76 +1,65 @@
+require 'shrimp/base_middleware'
+
module Shrimp
- class Middleware
+ class Middleware < BaseMiddleware
def initialize(app, options = { }, conditions = { })
- @app = app
- @options = options
- @conditions = conditions
+ super
@options[:polling_interval] ||= 1
@options[:polling_offset] ||= 1
@options[:cache_ttl] ||= 1
@options[:request_timeout] ||= @options[:polling_interval] * 10
end
- def call(env)
- @request = Rack::Request.new(env)
- if render_as_pdf? #&& headers['Content-Type'] =~ /text\/html|application\/xhtml\+xml/
- if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0)
- if File.size(render_to) == 0
- File.delete(render_to)
- remove_rendering_flag
- return error_response
- end
- return ready_response if env['HTTP_X_REQUESTED_WITH']
- file = File.open(render_to, "rb")
- body = file.read
- file.close
- File.delete(render_to) if @options[:cache_ttl] == 0
+ def render_as_pdf(env)
+ if already_rendered? && (up_to_date?(@options[:cache_ttl]) || @options[:cache_ttl] == 0)
+ if File.size(render_to) == 0
+ delete_tmp_files
remove_rendering_flag
- response = [body]
- headers = { }
- headers["Content-Length"] = (body.respond_to?(:bytesize) ? body.bytesize : body.size).to_s
- headers["Content-Type"] = "application/pdf"
- [200, headers, response]
- else
- if rendering_in_progress?
- if rendering_timed_out?
- remove_rendering_flag
- error_response
- else
- reload_response(@options[:polling_interval])
- end
+ return error_response
+ end
+ return ready_response if env['HTTP_X_REQUESTED_WITH']
+ body = pdf_body()
+ delete_tmp_files if @options[:cache_ttl] == 0
+ remove_rendering_flag
+ headers = pdf_headers(body)
+ [200, headers, [body]]
+ else
+ if rendering_in_progress?
+ if rendering_timed_out?
+ remove_rendering_flag
+ error_response
else
- File.delete(render_to) if already_rendered?
- set_rendering_flag
- fire_phantom
- reload_response(@options[:polling_offset])
+ reload_response(@options[:polling_interval])
end
+ else
+ delete_tmp_files if already_rendered?
+ set_rendering_flag
+ # Start PhantomJS rendering in a separate thread and then immediately render a web page
+ # that continuously reloads (polls) until the rendering is complete.
+ # Using Thread.new instead of Process::detach fork because Process fork will cause
+ # database disconnection when the forked process ended
+ Thread.new {
+ render_pdf
+ }
+ reload_response(@options[:polling_offset])
end
- else
- @app.call(env)
end
end
private
- # Private: start phantom rendering in a separate process
- def fire_phantom
- Process::detach fork { Phantom.new(@request.url.sub(%r{\.pdf$}, ''), @options, @request.cookies).to_pdf(render_to) }
- end
-
- def render_to
- file_name = Digest::MD5.hexdigest(@request.path) + ".pdf"
- file_path = @options[:out_path]
- "#{file_path}/#{file_name}"
- end
-
def already_rendered?
- File.exists?(render_to)
+ File.exists?(render_to_done) && File.exists?(render_to)
end
def up_to_date?(ttl = 30)
(Time.now - File.new(render_to).mtime) <= ttl
end
+ def delete_tmp_files
+ File.delete(render_to)
+ File.delete(render_to_done)
+ end
def remove_rendering_flag
@request.session["phantom-rendering"] ||={ }
@@ -82,45 +71,19 @@ def set_rendering_flag
@request.session["phantom-rendering"][render_to] = Time.now
end
- def rendering_timed_out?
- Time.now - @request.session["phantom-rendering"][render_to] > @options[:request_timeout]
+ def rendering_started_at
+ @request.session["phantom-rendering"][render_to].to_time
end
- def rendering_in_progress?
- @request.session["phantom-rendering"]||={ }
- @request.session["phantom-rendering"][render_to]
+ def rendering_timed_out?
+ Time.now - rendering_started_at > @options[:request_timeout]
end
- def render_as_pdf?
- request_path_is_pdf = !!@request.path.match(%r{\.pdf$})
-
- if request_path_is_pdf && @conditions[:only]
- rules = [@conditions[:only]].flatten
- rules.any? do |pattern|
- if pattern.is_a?(Regexp)
- @request.path =~ pattern
- else
- @request.path[0, pattern.length] == pattern
- end
- end
- elsif request_path_is_pdf && @conditions[:except]
- rules = [@conditions[:except]].flatten
- rules.map do |pattern|
- if pattern.is_a?(Regexp)
- return false if @request.path =~ pattern
- else
- return false if @request.path[0, pattern.length] == pattern
- end
- end
- return true
- else
- request_path_is_pdf
- end
+ def rendering_in_progress?
+ @request.session["phantom-rendering"] ||={ }
+ !!@request.session["phantom-rendering"][render_to]
end
- def concat(accepts, type)
- (accepts || '').split(',').unshift(type).compact.join(',')
- end
def reload_response(interval=1)
body = <<-HTML.gsub(/[ \n]+/, ' ').strip
@@ -128,7 +91,7 @@ def reload_response(interval=1)
- Preparing pdf...
+ Preparing PDF file. Please wait...
html>
HTML
@@ -146,7 +109,7 @@ def ready_response
- PDF ready here
+ PDF file ready here
html>
HTML
@@ -162,7 +125,7 @@ def error_response
- Sorry request timed out...
+ Sorry, the request timed out.
html>
HTML
diff --git a/lib/shrimp/phantom.rb b/lib/shrimp/phantom.rb
index 9b689ff..43e0c3e 100644
--- a/lib/shrimp/phantom.rb
+++ b/lib/shrimp/phantom.rb
@@ -5,7 +5,7 @@
module Shrimp
class NoExecutableError < StandardError
def initialize
- msg = "No phantomjs executable found at #{Shrimp.configuration.phantomjs}\n"
+ msg = "No phantomjs executable found at #{Shrimp.config.phantomjs}\n"
msg << ">> Please install phantomjs - http://phantomjs.org/download.html"
super(msg)
end
@@ -25,35 +25,75 @@ def initialize(msg = nil)
class Phantom
attr_accessor :source, :configuration, :outfile
- attr_reader :options, :cookies, :result, :error
+ attr_reader :options, :cookies, :result, :error, :response, :response_headers
SCRIPT_FILE = File.expand_path('../rasterize.js', __FILE__)
# Public: Runs the phantomjs binary
#
- # Returns the stdout output of phantomjs
+ # Returns the stdout output from phantomjs
def run
@error = nil
+ puts "Running command: #{cmd}" if options[:debug]
@result = `#{cmd}`
+ if match = @result.match(response_line_regexp)
+ @response = JSON.parse match[1]
+ @response_headers = @response['headers'].inject({}) {|hash, header|
+ hash[header['name']] = header['value']; hash
+ }
+ @result.gsub! response_line_regexp, ''
+ end
unless $?.exitstatus == 0
- @error = @result
+ @error = @result.chomp
@result = nil
end
@result
end
def run!
- @error = nil
- @result = `#{cmd}`
- unless $?.exitstatus == 0
- @error = @result
- @result = nil
- raise RenderingError.new(@error)
+ run.tap {
+ raise RenderingError.new(error) if error?
+ }
+ end
+
+ def response_line_regexp
+ /^response: (.*)$\n?/
+ end
+ def redirect?
+ page_load_status_code == 302
+ end
+ def redirect_to
+ return unless redirect?
+ response['redirectURL'] if response
+ end
+
+ def error?
+ !!error
+ end
+
+ def match_page_load_error
+ error.to_s.match /^.* \(HTTP (null|\S+)\).*/
+ end
+ def page_load_error?
+ !!match_page_load_error
+ end
+ def page_load_status_code
+ if match = match_page_load_error
+ status_code = match[1].to_s
+ if status_code =~ /\A\d+\Z/
+ status_code.to_i
+ else
+ status_code
+ end
end
- @result
end
- # Public: Returns the phantom rasterize command
+ # Public: Returns the arguments for the PhantomJS rasterize command as a shell-escaped string
def cmd
+ Shellwords.join cmd_array
+ end
+
+ # Public: Returns the arguments for the PhantomJS rasterize command as an array
+ def cmd_array
cookie_file = dump_cookies
format, zoom, margin, orientation = options[:format], options[:zoom], options[:margin], options[:orientation]
rendering_time, timeout = options[:rendering_time], options[:rendering_timeout]
@@ -65,7 +105,7 @@ def cmd
Shrimp.configuration.phantomjs,
command_config_file,
SCRIPT_FILE,
- @source.to_s.shellescape,
+ @source.to_s,
@outfile,
format,
zoom,
@@ -77,7 +117,7 @@ def cmd
viewport_width,
viewport_height,
max_redirect_count
- ].join(" ")
+ ].map(&:to_s)
end
# Public: initializes a new Phantom Object
@@ -94,10 +134,10 @@ def cmd
# Returns self
def initialize(url_or_file, options = { }, cookies={ }, outfile = nil)
@source = Source.new(url_or_file)
- @options = Shrimp.configuration.default_options.merge(options)
+ @options = Shrimp.config.to_h.merge(options)
@cookies = cookies
@outfile = File.expand_path(outfile) if outfile
- raise NoExecutableError.new unless File.exists?(Shrimp.configuration.phantomjs)
+ raise NoExecutableError.new unless File.exists?(Shrimp.config.phantomjs)
end
# Public: renders to pdf
diff --git a/lib/shrimp/rasterize.js b/lib/shrimp/rasterize.js
index 731382e..0106731 100644
--- a/lib/shrimp/rasterize.js
+++ b/lib/shrimp/rasterize.js
@@ -26,7 +26,7 @@ function print_usage() {
}
window.setTimeout(function () {
- error("Shit's being weird no result within: " + time_out + "ms");
+ error("No result within " + time_out + "ms. Aborting PhantomJS.");
}, time_out);
function renderUrl(url, output, options) {
@@ -44,27 +44,33 @@ function renderUrl(url, output, options) {
// determine the statusCode
page.onResourceReceived = function (resource) {
if (resource.url == url) {
+ console.log('response: ' + JSON.stringify(resource))
statusCode = resource.status;
}
};
page.onResourceError = function (resourceError) {
- error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
+ // Log the error but allow normal "Unable to load the page." handling to occur too so that we
+ // can get and report the actual HTTP status code. (resourceError.errorCode just returns a code
+ // from http://doc.qt.io/qt-4.8/qnetworkreply.html#NetworkError-enum)
+ console.log(resourceError.errorString);
};
- page.onNavigationRequested = function (redirect_url, type, willNavigate, main) {
- if (main) {
- if (redirect_url !== url) {
- page.close();
-
- if (redirects_num-- >= 0) {
- renderUrl(redirect_url, output, options);
- } else {
- error(url + ' redirects to ' + redirect_url + ' after maximum number of redirects reached');
+ if (redirects_num > 0) {
+ page.onNavigationRequested = function (redirect_url, type, willNavigate, main) {
+ if (main) {
+ if (redirect_url !== url) {
+ page.close();
+
+ if (redirects_num-- >= 0) {
+ renderUrl(redirect_url, output, options);
+ } else {
+ error(url + ' redirects to ' + redirect_url + ' after maximum number of redirects reached');
+ }
}
}
- }
- };
+ };
+ }
page.open(url, function (status) {
if (status !== 'success' || (statusCode != 200 && statusCode != null)) {
@@ -77,22 +83,33 @@ function renderUrl(url, output, options) {
console.log(e);
}
- error('Unable to load the URL: ' + url + ' (HTTP ' + statusCode + ')');
+ error('Unable to load the page. (HTTP ' + statusCode + ') (URL: ' + url + ')');
} else {
+ /* Check whether the loaded page overwrites the header/footer setting,
+ i.e. whether a PhantomJSPriting object exists. Use that then instead
+ of our defaults above.
+ See https://github.com/ariya/phantomjs/blob/master/examples/printheaderfooter.js#L66
+ */
window.setTimeout(function () {
- page.render(output + '_tmp.pdf');
-
- if (fs.exists(output)) {
- fs.remove(output);
- }
-
- try {
- fs.move(output + '_tmp.pdf', output);
- } catch (e) {
- error(e);
+ if (page.evaluate(function(){return typeof PhantomJSPrinting == "object";})) {
+ paperSize = page.paperSize;
+ paperSize.header.height = page.evaluate(function() {
+ return PhantomJSPrinting.header.height;
+ });
+ paperSize.header.contents = phantom.callback(function(pageNum, numPages) {
+ return page.evaluate(function(pageNum, numPages){return PhantomJSPrinting.header.contents(pageNum, numPages);}, pageNum, numPages);
+ });
+ paperSize.footer.height = page.evaluate(function() {
+ return PhantomJSPrinting.footer.height;
+ });
+ paperSize.footer.contents = phantom.callback(function(pageNum, numPages) {
+ return page.evaluate(function(pageNum, numPages){return PhantomJSPrinting.footer.contents(pageNum, numPages);}, pageNum, numPages);
+ });
+ page.paperSize = paperSize;
}
- console.log('Rendered to: ' + output, new Date().getTime());
- phantom.exit(0);
+ page.render(output);
+ console.log('rendered to: ' + output, new Date().getTime());
+ phantom.exit();
}, render_time);
}
});
@@ -125,8 +142,10 @@ if (system.args.length < 3 || system.args.length > 13) {
if (system.args.length > 3 && system.args[2].substr(-4) === ".pdf") {
size = system.args[3].split('*');
- page_options.paperSize = size.length === 2 ? { width:size[0], height:size[1], margin:'0px' }
- : { format:system.args[3], orientation:orientation, margin:margin };
+ header = { height: '1cm', contents: phantom.callback(function(pageNum, numPages) { return ""; }) };
+ footer = { height: '1cm', contents: phantom.callback(function(pageNum, numPages) { return ""; }) };
+ page_options.paperSize = size.length === 2 ? { width:size[0], height:size[1], margin:'0px', header: header, footer: footer }
+ : { format:system.args[3], orientation:orientation, margin:margin, header: header, footer: footer };
}
if (system.args.length > 4) {
page_options.zoomFactor = system.args[4];
diff --git a/lib/shrimp/synchronous_middleware.rb b/lib/shrimp/synchronous_middleware.rb
new file mode 100644
index 0000000..8ed7775
--- /dev/null
+++ b/lib/shrimp/synchronous_middleware.rb
@@ -0,0 +1,34 @@
+require 'shrimp/base_middleware'
+
+module Shrimp
+ class SynchronousMiddleware < BaseMiddleware
+ def render_as_pdf(env)
+ # Start PhantomJS rendering in the same process (synchronously) and wait until it completes.
+ render_pdf
+ return phantomjs_error_response if phantom.error?
+
+ body = pdf_body()
+ headers = pdf_headers(body, {
+ disposition: @phantom.response_headers['X-Pdf-Disposition'],
+ filename: @phantom.response_headers['X-Pdf-Filename']
+ }.reject {|k, v| v.nil? }
+ )
+ [200, headers, [body]]
+ end
+
+ attr_reader :phantom
+
+ private
+
+ def phantomjs_error_response
+ headers = {'Content-Type' => 'text/html'}
+ if phantom.page_load_error?
+ status_code = phantom.page_load_status_code
+ headers['Location'] = phantom.redirect_to if phantom.redirect?
+ else
+ status_code = 500
+ end
+ [status_code, headers, [phantom.error]]
+ end
+ end
+end
diff --git a/shrimp.gemspec b/shrimp.gemspec
index 755850d..cd24bb8 100644
--- a/shrimp.gemspec
+++ b/shrimp.gemspec
@@ -24,4 +24,6 @@ Gem::Specification.new do |gem|
gem.add_development_dependency(%q, [">= 2.2.0"])
gem.add_development_dependency(%q, [">= 0.5.6"])
gem.add_development_dependency(%q, ["= 1.4.1"])
+ gem.add_development_dependency(%q)
+ gem.add_development_dependency(%q)
end
diff --git a/spec/shrimp/middleware_spec.rb b/spec/shrimp/middleware_spec.rb
index 48e720f..2f4dfb8 100644
--- a/spec/shrimp/middleware_spec.rb
+++ b/spec/shrimp/middleware_spec.rb
@@ -1,34 +1,24 @@
require 'spec_helper'
-def app;
- Rack::Lint.new(@app)
-end
-
-def options
- { :margin => "1cm", :out_path => Dir.tmpdir,
- :polling_offset => 10, :polling_interval => 1, :cache_ttl => 3600,
- :request_timeout => 1 }
-end
-
-def mock_app(options = { }, conditions = { })
- main_app = lambda { |env|
- headers = { 'Content-Type' => "text/html" }
- [200, headers, ['Hello world!']]
- }
-
- @middleware = Shrimp::Middleware.new(main_app, options, conditions)
- @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session')
+shared_context Shrimp::Middleware do
+ def mock_app(options = { }, conditions = { })
+ @middleware = Shrimp::Middleware.new(main_app, options, conditions)
+ @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session')
+ end
end
-
describe Shrimp::Middleware do
- before { mock_app(options) }
+ include_context Shrimp::Middleware
+
+ before { mock_app(middleware_options) }
+ subject { @middleware }
context "matching pdf" do
it "should render as pdf" do
get '/test.pdf'
- @middleware.send(:'render_as_pdf?').should be true
+ @middleware.send(:render_as_pdf?).should be true
end
+
it "should return 503 the first time" do
get '/test.pdf'
last_response.status.should eq 503
@@ -42,9 +32,9 @@ def mock_app(options = { }, conditions = { })
last_response.header["Retry-After"].should eq "1"
end
- it "should set render to to outpath" do
+ it "should set render_to to out_path" do
get '/test.pdf'
- @middleware.send(:render_to).should match (Regexp.new("^#{options[:out_path]}"))
+ @middleware.send(:render_to).should start_with middleware_options[:out_path]
end
it "should return 504 on timeout" do
@@ -62,62 +52,83 @@ def mock_app(options = { }, conditions = { })
last_response.status.should eq 503
end
- it "should return a pdf with 200 after rendering" do
- mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now)
- File.should_receive(:'exists?').and_return true
- File.should_receive(:'size').and_return 1000
- File.should_receive(:'open').and_return mock_file
- File.should_receive(:'new').and_return mock_file
- get '/test.pdf'
- last_response.status.should eq 200
- last_response.body.should eq "Hello World"
+ describe "when already_rendered? and up_to_date?" do
+ before {
+ mock_file = double(File, :read => "Hello World", :close => true, :mtime => Time.now)
+ File.should_receive(:exists?).at_least(:once).and_return true
+ File.should_receive(:size).and_return 1000
+ File.should_receive(:open).and_return mock_file
+ File.should_receive(:new).at_least(:once).and_return mock_file
+ get '/test.pdf'
+ }
+
+ its(:rendering_in_progress?) { should eq false }
+ its(:already_rendered?) { should eq true }
+ its(:up_to_date?) { should eq true }
+
+ it "should return a pdf with 200" do
+ last_response.status.should eq 200
+ last_response.headers['Content-Type'].should eq 'application/pdf'
+ last_response.body.should eq "Hello World"
+ end
end
+ describe "requesting a simple path" do
+ before { get '/test.pdf' }
+ its(:html_url) { should eq 'http://example.org/test' }
+ end
+ describe "requesting a path with a query string" do
+ before { get '/test.pdf?size=10' }
+ its(:html_url) { should eq 'http://example.org/test?size=10' }
+ end
end
+
context "not matching pdf" do
it "should skip pdf rendering" do
get 'http://www.example.org/test'
last_response.body.should include "Hello world!"
- @middleware.send(:'render_as_pdf?').should be false
+ @middleware.send(:render_as_pdf?).should be false
end
end
end
-describe "Conditions" do
+describe Shrimp::Middleware, "Conditions" do
+ include_context Shrimp::Middleware
+
context "only" do
- before { mock_app(options, :only => [%r[^/invoice], %r[^/public]]) }
+ before { mock_app(middleware_options, :only => [%r[^/invoice], %r[^/public]]) }
it "render pdf for set only option" do
get '/invoice/test.pdf'
- @middleware.send(:'render_as_pdf?').should be true
+ @middleware.send(:render_as_pdf?).should be true
end
it "render pdf for set only option" do
get '/public/test.pdf'
- @middleware.send(:'render_as_pdf?').should be true
+ @middleware.send(:render_as_pdf?).should be true
end
it "not render pdf for any other path" do
get '/secret/test.pdf'
- @middleware.send(:'render_as_pdf?').should be false
+ @middleware.send(:render_as_pdf?).should be false
end
end
context "except" do
- before { mock_app(options, :except => %w(/secret)) }
+ before { mock_app(middleware_options, :except => %w(/secret)) }
it "render pdf for set only option" do
get '/invoice/test.pdf'
- @middleware.send(:'render_as_pdf?').should be true
+ @middleware.send(:render_as_pdf?).should be true
end
it "render pdf for set only option" do
get '/public/test.pdf'
- @middleware.send(:'render_as_pdf?').should be true
+ @middleware.send(:render_as_pdf?).should be true
end
it "not render pdf for any other path" do
get '/secret/test.pdf'
- @middleware.send(:'render_as_pdf?').should be false
+ @middleware.send(:render_as_pdf?).should be false
end
end
end
diff --git a/spec/shrimp/phantom_spec.rb b/spec/shrimp/phantom_spec.rb
index 9d4978b..2b91498 100644
--- a/spec/shrimp/phantom_spec.rb
+++ b/spec/shrimp/phantom_spec.rb
@@ -11,44 +11,25 @@ def valid_pdf(io)
end
describe Shrimp::Phantom do
- let(:testfile) { File.expand_path('../test_file.html', __FILE__) }
-
before do
Shrimp.configure { |config| config.rendering_time = 1000 }
end
- # describe ".quote_arg" do
- # subject { described_class }
-
- # let(:arg) { "test" }
-
- # it "wraps the argument with single quotes" do
- # subject.quote_arg(arg).should eq "'test'"
- # end
-
- # context "when the argument contains single quotes" do
- # let(:arg) { "'te''st'" }
-
- # it "escapes them" do
- # %x(echo #{subject.quote_arg(arg)}).strip.should eq arg
- # end
- # end
- # end
-
it "should initialize attributes" do
- phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf")
- phantom.source.to_s.should eq "file://#{testfile}"
+ phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{tmpdir}/test.pdf")
+ phantom.source.to_s.should eq "file://#{test_file}"
phantom.options[:margin].should eq "2cm"
- phantom.outfile.should eq "#{Dir.tmpdir}/test.pdf"
+ phantom.outfile.should eq "#{tmpdir}/test.pdf"
end
it "should render a pdf file" do
- #phantom = Shrimp::Phantom.new("file://#{@path}")
- #phantom.to_pdf("#{Dir.tmpdir}/test.pdf").first should eq "#{Dir.tmpdir}/test.pdf"
+ phantom = Shrimp::Phantom.new("file://#{test_file}")
+ phantom.to_pdf("#{tmpdir}/test.pdf").should eq "#{tmpdir}/test.pdf"
+ phantom.result.should include "rendered to: #{tmpdir}/test.pdf"
end
it "should accept a local file url" do
- phantom = Shrimp::Phantom.new("file://#{testfile}")
+ phantom = Shrimp::Phantom.new("file://#{test_file}")
phantom.source.should be_url
end
@@ -57,27 +38,39 @@ def valid_pdf(io)
phantom.source.should be_url
end
- it "should parse options into a cmd line" do
- phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm", :max_redirect_count => 10 }, { }, "#{Dir.tmpdir}/test.pdf")
- phantom.cmd.should include "test.pdf A4 1 2cm portrait"
- phantom.cmd.should include "file://#{testfile}"
- phantom.cmd.should include "lib/shrimp/rasterize.js"
- phantom.cmd.should end_with " 10"
- end
+ describe '#cmd' do
+ it "should generate the correct cmd" do
+ phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm", :max_redirect_count => 10 }, { }, "#{tmpdir}/test.pdf")
+ phantom.cmd.should include "test.pdf A4 1 2cm portrait"
+ phantom.cmd.should include "file://#{test_file}"
+ phantom.cmd.should include "lib/shrimp/rasterize.js"
+ phantom.cmd.should end_with " 10"
+ end
+
+ it "should escape arguments" do
+ phantom = Shrimp::Phantom.new("http://example.com/?something")
+ phantom.cmd_array.should include "http://example.com/?something"
+ phantom.cmd. should include "http://example.com/\\?something"
- it "should properly escape arguments" do
- malicious_uri = "file:///hello';shutdown"
- bogus_phantom = Shrimp::Phantom.new(malicious_uri)
+ phantom = Shrimp::Phantom.new("http://example.com/path/file.html?width=100&height=100")
+ phantom.cmd_array.should include "http://example.com/path/file.html?width=100&height=100"
+ phantom.cmd. should include "http://example.com/path/file.html\\?width\\=100\\&height\\=100"
+ end
+
+ it "should properly escape arguments" do
+ malicious_uri = "file:///hello';shutdown"
+ bogus_phantom = Shrimp::Phantom.new(malicious_uri)
- bogus_phantom.cmd.should_not include malicious_uri
+ bogus_phantom.cmd.should_not include malicious_uri
- Shrimp.configuration.stub(:phantomjs).and_return "echo"
- %x(#{bogus_phantom.cmd}).strip.should include malicious_uri
+ Shrimp.configuration.stub(:phantomjs).and_return "echo"
+ %x(#{bogus_phantom.cmd}).strip.should include malicious_uri
+ end
end
context "rendering to a file" do
- before do
- phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf")
+ before(:all) do
+ phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { }, "#{Dir.tmpdir}/test.pdf")
@result = phantom.to_file
end
@@ -86,30 +79,32 @@ def valid_pdf(io)
end
it "should be a valid pdf" do
- valid_pdf(@result)
+ valid_pdf?(@result).should eq true
+ pdf_strings(@result).should eq "Hello\tWorld!"
end
end
context "rendering to a pdf" do
- before do
- @phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { })
- @result = @phantom.to_pdf("#{Dir.tmpdir}/test.pdf")
+ before(:all) do
+ @phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { })
+ @result = @phantom.to_pdf("#{tmpdir}/test.pdf")
end
it "should return a path to pdf" do
@result.should be_a String
- @result.should eq "#{Dir.tmpdir}/test.pdf"
+ @result.should eq "#{tmpdir}/test.pdf"
end
it "should be a valid pdf" do
- valid_pdf(@result)
+ valid_pdf?(@result).should eq true
+ pdf_strings(Pathname(@result)).should eq "Hello\tWorld!"
end
end
context "rendering to a String" do
- before do
- phantom = Shrimp::Phantom.new("file://#{testfile}", { :margin => "2cm" }, { })
- @result = phantom.to_string("#{Dir.tmpdir}/test.pdf")
+ before(:all) do
+ phantom = Shrimp::Phantom.new("file://#{test_file}", { :margin => "2cm" }, { })
+ @result = phantom.to_string("#{tmpdir}/test.pdf")
end
it "should return the File IO String" do
@@ -117,39 +112,74 @@ def valid_pdf(io)
end
it "should be a valid pdf" do
- valid_pdf(@result)
+ valid_pdf?(@result).should eq true
+ pdf_strings(@result).should eq "Hello\tWorld!"
end
end
- context "Error" do
- it "should return result nil" do
- phantom = Shrimp::Phantom.new("file://foo/bar")
- @result = phantom.run
- @result.should be_nil
+ context "Errors" do
+ describe "'Unable to load the address' error" do
+ before { @result = phantom.run }
+
+ context 'an invalid http: address' do
+ subject(:phantom) { Shrimp::Phantom.new("http://example.com/foo/bar") }
+ it { @result.should be_nil }
+ its(:error) { should eq "Error downloading http://example.com/foo/bar - server replied: Not Found\nUnable to load the page. (HTTP 404) (URL: http://example.com/foo/bar)" }
+ its(:page_load_error?) { should eq true }
+ its(:page_load_status_code) { should eq 404 }
+ end
+
+ context 'an http: response that redirects' do
+ around(:each) do |example|
+ with_local_server do |server|
+ server.mount_proc '/' do |request, response|
+ response.body = 'Home'
+ raise WEBrick::HTTPStatus::OK
+ end
+ server.mount_proc '/redirect_me' do |request, response|
+ response['Location'] = '/'
+ raise WEBrick::HTTPStatus::Found
+ end
+ example.run
+ end
+ end
+ subject(:phantom) { Shrimp::Phantom.new("http://#{local_server_host}/redirect_me") }
+ it { @result.should be_nil }
+ its(:error) { should eq "Unable to load the page. (HTTP 302) (URL: http://localhost:8800/redirect_me)" }
+ its(:page_load_error?) { should eq true }
+ its(:page_load_status_code) { should eq 302 }
+ its('response.keys') { should include 'redirectURL' }
+ its('response_headers.keys') { should == ['Location', 'Server', 'Date', 'Content-Length', 'Connection'] }
+ its(:redirect_to) { should eq "http://#{local_server_host}/" }
+ end
+
+ context 'an invalid file: address' do
+ subject(:phantom) { Shrimp::Phantom.new("file:///foo/bar") }
+ it { @result.should be_nil }
+ its(:error) { should include "Error opening /foo/bar: No such file or directory\nUnable to load the page. (HTTP null) (URL: file:///foo/bar)" }
+ its(:page_load_error?) { should eq true }
+ its(:page_load_status_code) { should eq 'null' }
+ end
end
+ end
+ context "Errors (using bang methods)" do
it "should be unable to load the address" do
phantom = Shrimp::Phantom.new("file:///foo/bar")
- phantom.run
- phantom.error.should include "Error opening /foo/bar: No such file or directory (URL: file:///foo/bar)"
- end
-
- it "should be unable to copy file" do
- phantom = Shrimp::Phantom.new("file://#{testfile}")
- phantom.to_pdf("/foo/bar/")
- phantom.error.should include "Unable to copy file "
+ expect { phantom.run! }.to raise_error Shrimp::RenderingError
end
end
- context "Error Bang!" do
- it "should be unable to load the address" do
- phantom = Shrimp::Phantom.new("file:///foo/bar")
- expect { phantom.run! }.to raise_error Shrimp::RenderingError
+ context 'test_file_with_page_numbers.html' do
+ let(:test_file) { super('test_file_with_page_numbers.html') }
+
+ before do
+ phantom = Shrimp::Phantom.new("file://#{test_file}")
+ @result = phantom.to_string("#{tmpdir}/test.pdf")
end
- it "should be unable to copy file" do
- phantom = Shrimp::Phantom.new("file://#{testfile}")
- expect { phantom.to_pdf!("/foo/bar/") }.to raise_error Shrimp::RenderingError
+ it "PDF should contain page numbers" do
+ pdf_strings(@result).should eq "Header:\tPage\t1/2Footer:\tPage\t1/2Hello\tWorld!Hello\tWorld!Header:\tPage\t2/2Footer:\tPage\t2/2"
end
end
end
diff --git a/spec/shrimp/synchronous_middleware_spec.rb b/spec/shrimp/synchronous_middleware_spec.rb
new file mode 100644
index 0000000..263c471
--- /dev/null
+++ b/spec/shrimp/synchronous_middleware_spec.rb
@@ -0,0 +1,131 @@
+require 'spec_helper'
+
+shared_context Shrimp::SynchronousMiddleware do
+ def mock_app(options = { }, conditions = { })
+ @middleware = Shrimp::SynchronousMiddleware.new(main_app, options, conditions)
+ @app = Rack::Session::Cookie.new(@middleware, :key => 'rack.session')
+ end
+end
+
+describe Shrimp::SynchronousMiddleware do
+ include_context Shrimp::SynchronousMiddleware
+
+ before { mock_app(middleware_options) }
+ subject { @middleware }
+
+ context "matching pdf" do
+ describe "requesting a simple path" do
+ before { get '/test.pdf' }
+ its(:html_url) { should eq 'http://example.org/test' }
+ its(:render_as_pdf?) { should be true }
+ it { @middleware.send(:render_to).should start_with middleware_options[:out_path] }
+ it "should return a 404 status because http://example.org/test does not exist" do
+ last_response.status.should eq 404
+ message = "Error downloading http://example.org/test - server replied: Not Found\nUnable to load the page. (HTTP 404) (URL: http://example.org/test)"
+ last_response.body. should eq message
+ @middleware.phantom.error.should eq message
+ end
+ end
+
+ describe "requesting a path with a query string" do
+ before { get '/test.pdf?size=10' }
+ its(:render_as_pdf?) { should be true }
+ its(:html_url) { should eq 'http://example.org/test?size=10' }
+ end
+
+ describe "requesting a simple path (and we stub html_url to a file url)" do
+ before { @middleware.stub(:html_url).and_return "file://#{test_file}" }
+ before { get '/test.pdf' }
+ it "should return a valid pdf with 200 status" do
+ last_response.status.should eq 200
+ last_response.headers['Content-Type'].should eq 'application/pdf'
+ valid_pdf?(last_response.body).should eq true
+ @middleware.phantom.result.should start_with "rendered to: #{@middleware.render_to}"
+ end
+ end
+
+ context 'requesting an HTML resource that sets a X-Pdf-Filename header' do
+ before {
+ @middleware.stub(:html_url).and_return "file://#{test_file}"
+ phantom = Shrimp::Phantom.new(@middleware.html_url)
+ phantom.stub :response_headers => {
+ 'X-Pdf-Filename' => 'Some Fancy Report Title.pdf'
+ }
+ Shrimp::Phantom.should_receive(:new).and_return phantom
+ }
+ before { get '/use_different_filename.pdf' }
+ it "should use the filename from the X-Pdf-Filename header" do
+ last_response.status.should eq 200
+ last_response.headers['Content-Type'].should eq 'application/pdf'
+ last_response.headers['Content-Disposition'].should eq %(attachment; filename="Some Fancy Report Title.pdf")
+ valid_pdf?(last_response.body).should eq true
+ end
+ end
+
+ context 'requesting an HTML resource that redirects' do
+ before {
+ phantom = Shrimp::Phantom.new('http://example.org/redirect_me')
+ phantom.should_receive(:to_pdf).and_return nil
+ phantom.stub :error => "Unable to load the page. (HTTP 302) (URL: http://example.org/redirect_me)",
+ :redirect_to => "http://example.org/sign_in"
+ Shrimp::Phantom.should_receive(:new).and_return phantom
+ }
+ before { get '/redirect_me.pdf' }
+ it "should follow the redirect that the phantomjs request encountered" do
+ # This tests the phantomjs_error_response method
+ last_response.status.should eq 302
+ last_response.headers['Content-Type'].should eq 'text/html'
+ last_response.headers['Location'].should eq "http://example.org/sign_in"
+ @middleware.phantom.error.should include "Unable to load the page"
+ end
+ end
+ end
+
+ context "not matching pdf" do
+ it "should skip pdf rendering" do
+ get 'http://www.example.org/test'
+ last_response.body.should include "Hello world!"
+ @middleware.render_as_pdf?.should be false
+ end
+ end
+end
+
+describe Shrimp::SynchronousMiddleware, "Conditions" do
+ include_context Shrimp::SynchronousMiddleware
+
+ context "only" do
+ before { mock_app(middleware_options, :only => [%r[^/invoice], %r[^/public]]) }
+ it "render pdf for set only option" do
+ get '/invoice/test.pdf'
+ @middleware.render_as_pdf?.should be true
+ end
+
+ it "render pdf for set only option" do
+ get '/public/test.pdf'
+ @middleware.render_as_pdf?.should be true
+ end
+
+ it "not render pdf for any other path" do
+ get '/secret/test.pdf'
+ @middleware.render_as_pdf?.should be false
+ end
+ end
+
+ context "except" do
+ before { mock_app(middleware_options, :except => %w(/secret)) }
+ it "render pdf for set only option" do
+ get '/invoice/test.pdf'
+ @middleware.render_as_pdf?.should be true
+ end
+
+ it "render pdf for set only option" do
+ get '/public/test.pdf'
+ @middleware.render_as_pdf?.should be true
+ end
+
+ it "not render pdf for any other path" do
+ get '/secret/test.pdf'
+ @middleware.render_as_pdf?.should be false
+ end
+ end
+end
diff --git a/spec/shrimp/test_file_with_page_numbers.html b/spec/shrimp/test_file_with_page_numbers.html
new file mode 100644
index 0000000..77f995a
--- /dev/null
+++ b/spec/shrimp/test_file_with_page_numbers.html
@@ -0,0 +1,21 @@
+
+
+
+
+
+ Hello World!
+ Hello World!
+
+
+
diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb
index 154a571..7bb6ac4 100644
--- a/spec/spec_helper.rb
+++ b/spec/spec_helper.rb
@@ -1,7 +1,87 @@
require 'rack/test'
require 'shrimp'
+require 'webrick'
+require 'pdf/inspector'
RSpec.configure do |config|
include Rack::Test::Methods
end
+Shrimp.configure do |config|
+ # If we left this as the default value of true, then we couldn't check things like
+ # @middleware.render_as_pdf? in our tests after initiating a request with get '/test.pdf', because
+ # render_as_pdf? depends on @request, which doesn't get set until *after* we call call(env) with the
+ # request env. But when thread_safe is true, it actually prevents call(env) from changing any
+ # instance variables in the original object. (In the original object, @request will still be nil.)
+ config.thread_safe = false
+end
+
+def tmpdir
+ Shrimp.config.tmpdir
+end
+
+def test_file(file_name = 'test_file.html')
+ File.expand_path("../shrimp/#{file_name}", __FILE__)
+end
+
+def valid_pdf?(io)
+ case io
+ when File
+ io.read[0...4] == "%PDF"
+ when String
+ io[0...4] == "%PDF" || File.open(io).read[0...4] == "%PDF"
+ end
+end
+def pdf_strings(pdf)
+ PDF::Inspector::Text.analyze(pdf).strings.join
+end
+
+# Used by rack-test when we call get
+def app
+ Rack::Lint.new(@app)
+end
+
+def main_app
+ lambda { |env|
+ headers = { 'Content-Type' => "text/html" }
+ [200, headers, ['Hello world!']]
+ }
+end
+
+def middleware_options
+ {
+ :margin => "1cm",
+ :out_path => tmpdir,
+ :polling_offset => 10,
+ :polling_interval => 1,
+ :cache_ttl => 3600,
+ :request_timeout => 1
+ }
+end
+
+def local_server_port
+ 8800
+end
+def local_server_host
+ "localhost:#{local_server_port}"
+end
+
+def with_local_server
+ webrick_options = {
+ :Port => local_server_port,
+ :AccessLog => [],
+ :Logger => WEBrick::Log::new(RUBY_PLATFORM =~ /mswin|mingw/ ? 'NUL:' : '/dev/null', 7)
+ }
+ begin
+ # The "TCPServer Error: Address already in use - bind(2)" warning here appears to be bogus,
+ # because it occurs even the first time we start the server and nothing else is bound to the
+ # port.
+ server = WEBrick::HTTPServer.new(webrick_options)
+ trap("INT") { server.shutdown }
+ Thread.new { server.start }
+ yield server
+ server.shutdown
+ ensure
+ server.shutdown if server
+ end
+end