I have a rake task that runs great on my local machine, after deploying my app to a VPS, it won't let me run the task anymore.
I run the task using --
RAILS_ENV=production bundle exec rake db:insert_properties
and the output I get is --
(in /home/deployer/apps/nsrosu/releases/20131230151646)
Killed
Anyone have an idea as to why this may be happening? I have double and triple checked that the XML file I am using to pull data for the rake task does exist in the proper directory.
Additionally, I have tried, instead of using a file stored on the server, to pull it from an external source that is stored elsewhere, but nokogiri says that the file does not exist when I try it this way. A solution to either one of these problems would be excellent :)
Also, here is the rake task, in case that will help answer any questions --
# SET RAKE TASK NAMESPACE
namespace :db do
# RAKE TASK DESCRIPTION
desc "Fetch property information and insert it into the database"
# RAKE TASK NAME
task :insert_properties => :environment do
# REQUIRE LIBRARIES
require 'nokogiri'
require 'open-uri'
# OPEN THE XML FILE
mits_feed = File.open("app/assets/xml/mits.xml")
# OUTPUT THE XML DOCUMENT
doc = Nokogiri::XML(mits_feed)
# FIND PROPERTIES OWNED BY NORTHSTEPPE AND CYCLE THORUGH THEM
doc.xpath("//Property[PropertyID/Identification/@OrganizationName = 'northsteppe' ]").each do |property|
# SET UP EMPTY IMAGES ARRAY
@images =[]
# INSERT EACH IMAGE INTO THE IMAGES ARRAY
property.xpath("File").each do |image|
@images << image.at_xpath("Src/text()").to_s
end
# SET UP EXMPTY AMENITIES ARRAY
@amenities = []
# INSERT EACH AMENITY DESCRIPTION INTO THE AMENITIES ARRAY
property.xpath("ILS_Unit/Amenity").each do |image|
@amenities << image.at_xpath("Description/text()").to_s
end
# GATHER EACH PROPERTY'S INFORMATION
information = {
"street_address" => property.at_xpath("PropertyID/Address/AddressLine1/text()").to_s,
"city" => property.at_xpath("PropertyID/Address/City/text()").to_s,
"zipcode" => property.at_xpath("PropertyID/Address/PostalCode/text()").to_s,
"short_description" => property.at_xpath("PropertyID/MarketingName/text()").to_s,
"long_description" => property.at_xpath("Information/LongDescription/text()").to_s,
"rent" => property.at_xpath("Information/Rents/StandardRent/text()").to_s,
"application_fee" => property.at_xpath("Fee/ApplicationFee/text()").to_s,
"bedrooms" => property.at_xpath("Floorplan/Room[@RoomType='Bedroom']/Count/text()").to_s,
"bathrooms" => property.at_xpath("Floorplan/Room[@RoomType='Bathroom']/Count/text()").to_s,
"vacancy_status" => property.at_xpath("ILS_Unit/Availability/VacancyClass/text()").to_s,
"month_available" => property.at_xpath("ILS_Unit/Availability/MadeReadyDate/@Month").to_s,
"latitude" => property.at_xpath("ILS_Identification/Latitude/text()").to_s,
"longitude" => property.at_xpath("ILS_Identification/Longitude/text()").to_s,
"images" => @images,
"amenities" => @amenities
}
# SHOW RAW DATA IN TERMINAL TO MAKE SURE EVERYTHING IS WORKING
p information
# CREATE NEW PROPERTY WITH INFORMATION HASH CREATED ABOVE
if Property.create!(information)
puts "yay!"
else
puts "oh no! this sucks!"
end
end # ENDS XPATH EACH LOOP
end # ENDS INSERT_PROPERTIES RAKE TASK
end # ENDS NAMESAPCE DECLARATION
================================ UPDATE =================================
So it seem that the best approach is to run this through a SAX system and SAXMachine is all ready to work with Nokogiri, but the documentation for both of these technologies is pretty horrible. I was hoping to get some direction on how to set up a task that does the identical thing my above task does, but using SAXMachine. Please :)
I've posted an example one of the XML entries below --
<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<PropertyID>
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
<Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
<MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
<WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
<Address AddressType="property">
<Description>Address of Available Listing</Description>
<AddressLine1>1689 N 4th St </AddressLine1>
<City>Columbus</City>
<State>OH</State>
<PostalCode>43201</PostalCode>
<Country>US</Country>
</Address>
<Phone PhoneType="office">
<PhoneNumber>(614) 299-4110</PhoneNumber>
</Phone>
<Email>northsteppe.nsr@gmail.com</Email>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
<Latitude>39.997694</Latitude>
<Longitude>-82.99903</Longitude>
<LastUpdate Month="11" Day="11" Year="2013"/>
</ILS_Identification>
<Information>
<StructureType>Standard</StructureType>
<UnitCount>1</UnitCount>
<ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
<LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
<Rents>
<StandardRent>2000.00</StandardRent>
</Rents>
<PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
</Information>
<Fee>
<ProrateType>Standard</ProrateType>
<LateType>Standard</LateType>
<LatePercent>0</LatePercent>
<LateMinFee>0</LateMinFee>
<LateFeePerDay>0</LateFeePerDay>
<NonRefundableHoldFee>0</NonRefundableHoldFee>
<AdminFee>0</AdminFee>
<ApplicationFee>30.00</ApplicationFee>
<BrokerFee>0</BrokerFee>
</Fee>
<Deposit DepositType="Security Deposit">
<Amount AmountType="Actual">
<ValueRange Exact="2000.00" Currency="USD"/>
</Amount>
</Deposit>
<Policy>
<Pet Allowed="false"/>
</Policy>
<Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<Description/>
<UnitCount>1</UnitCount>
<RentableUnits>1</RentableUnits>
<TotalSquareFeet>0</TotalSquareFeet>
<RentableSquareFeet>0</RentableSquareFeet>
</Phase>
<Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<Description/>
<UnitCount>1</UnitCount>
<SquareFeet>0</SquareFeet>
</Building>
<Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<UnitCount>1</UnitCount>
<Room RoomType="Bedroom">
<Count>4</Count>
<Comment/>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
<Comment/>
</Room>
<SquareFeet Min="0" Max="0"/>
<MarketRent Min="2000" Max="2000"/>
<EffectiveRent Min="2000" Max="2000"/>
</Floorplan>
<ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Units>
<Unit>
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
<MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
<UnitBedrooms>4</UnitBedrooms>
<UnitBathrooms>1.0</UnitBathrooms>
<MinSquareFeet>0</MinSquareFeet>
<MaxSquareFeet>0</MaxSquareFeet>
<SquareFootType>internal</SquareFootType>
<UnitRent>2000.00</UnitRent>
<MarketRent>2000.00</MarketRent>
<Address AddressType="property">
<AddressLine1>1689 N 4th St </AddressLine1>
<City>Columbus</City>
<PostalCode>43201</PostalCode>
<Country>US</Country>
</Address>
</Unit>
</Units>
<Availability>
<VacateDate Month="7" Day="23" Year="2014"/>
<VacancyClass>Unoccupied</VacancyClass>
<MadeReadyDate Month="7" Day="23" Year="2014"/>
</Availability>
<Amenity AmenityType="Other">
<Description>All new stainless steel appliances! Refinished hardwood floors</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Ceramic tile</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Ceiling fans</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Wrap-around porch</Description>
</Amenity>
<Amenity AmenityType="Dryer">
<Description>Free Washer and Dryer</Description>
</Amenity>
<Amenity AmenityType="Washer">
<Description>Free Washer and Dryer</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>off-street parking available</Description>
</Amenity>
</ILS_Unit>
<File Active="true" FileID="820982141">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
<Width>360</Width>
<Height>300</Height>
<Rank>1</Rank>
</File>
<File Active="true" FileID="820982145">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>2</Rank>
</File>
<File Active="true" FileID="820982149">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>3</Rank>
</File>
<File Active="true" FileID="820982152">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>4</Rank>
</File>
<File Active="true" FileID="820982155">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>5</Rank>
</File>
<File Active="true" FileID="820982157">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>6</Rank>
</File>
<File Active="true" FileID="820982160">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>7</Rank>
</File>
Here's a start for a conversion, it should be enough to get you going. And, it's untested, and it's been a long time since I've written SAX code, so beware.
The first part is a clean-up of your original code to make it more like I'd write DOM code:
require 'nokogiri'
require 'open-uri'
# doc = Nokogiri::XML(File.open("app/assets/xml/mits.xml"))
# doc.xpath("//Property/PropertyID/Identification/@OrganizationName = 'northsteppe' ]").each do |property|
# images = property.xpath("File").map { |image|
# image.at_xpath("Src/text()").to_s
# }
# amenities = property.xpath("ILS_Unit/Amenity").map { |image|
# image.at_xpath("Description/text()").to_s
# }
# information = {
# "street_address" => property.at_xpath("PropertyID/Address/AddressLine1/text()").to_s,
# "city" => property.at_xpath("PropertyID/Address/City/text()").to_s,
# "zipcode" => property.at_xpath("PropertyID/Address/PostalCode/text()").to_s,
# "short_description" => property.at_xpath("PropertyID/MarketingName/text()").to_s,
# "long_description" => property.at_xpath("Information/LongDescription/text()").to_s,
# "rent" => property.at_xpath("Information/Rents/StandardRent/text()").to_s,
# "application_fee" => property.at_xpath("Fee/ApplicationFee/text()").to_s,
# "bedrooms" => property.at_xpath("Floorplan/Room[@RoomType='Bedroom']/Count/text()").to_s,
# "bathrooms" => property.at_xpath("Floorplan/Room[@RoomType='Bathroom']/Count/text()").to_s,
# "vacancy_status" => property.at_xpath("ILS_Unit/Availability/VacancyClass/text()").to_s,
# "month_available" => property.at_xpath("ILS_Unit/Availability/MadeReadyDate/@Month").to_s,
# "latitude" => property.at_xpath("ILS_Identification/Latitude/text()").to_s,
# "longitude" => property.at_xpath("ILS_Identification/Longitude/text()").to_s,
# "images" => images,
# "amenities" => amenities
# }
# p information
# if Property.create!(information)
# puts "yay!"
# else
# puts "oh no! this sucks!"
# end
# end
This is the start of SAX code:
class MitsDocument < Nokogiri::XML::SAX::Document
I define some class variables to keep track of the images
and amenities
:
@@images = []
@@amenities = []
Each time Nokogiri descends into a tag it calls start_element
:
def start_element(tag_name, attributes=[])
tag_attributes = Hash[*attributes]
# set up some flags to track the current state...
@in_property = true if (tag_name == 'Property')
@in_property_id = true if (tag_name == 'PropertyID')
@in_identification = true if (tag_name == 'Identification')
@organization_is_northsteppe = true if (tag_attributes['OrganizationName'] == 'northsteppe')
@in_file = true if (tag_name == 'File')
@in_source = true if (tag_name == 'Src')
@in_ils_unit = true if (tag_name == 'ILS_Unit')
@in_amentiy = true if (tag_name == 'Amenity')
@in_description = true if (tag_name == 'Description')
end
When a text node is encountered characters
gets called. If Nokogiri has descended far enough, which we can check by testing for certain flag combinations, the text will be pushed onto the appropriate array:
def characters(str)
if [@in_file, @in_source].all?
@@images << str
end
if [@in_ils_unit, @in_amentiy, @in_description].all?
@@amenities << str
end
end
When Nokogiri exits a node it calls end_element
with the name of the tag:
def end_element(name)
@in_property = false if (tag_name == 'Property')
@in_property_id = false if (tag_name == 'PropertyID')
@in_identification = false if (tag_name == 'Identification')
@organization_is_northsteppe = false if (tag_name == 'Identification')
If Nokogiri is read to exit a particular tag it's time to do something with the aggregated results of its sub-tags. This is how to deal with the class variables being tracked:
if (tag_name == 'File')
# do something with @@images
@in_file = false
end
@in_source = false if (tag_name == 'Src')
if (tag_name == 'ILS_Unit')
# do something with @@amenities
@in_ils_unit = false
end
@in_amentiy = false if (tag_name == 'Amenity')
@in_description = false if (tag_name == 'Description')
end
You'd clean up DB connections, or files, or where ever you're storing your content when the end of the document is reached:
def end_document
end
end
parser = Nokogiri::XML::SAX::Parser.new(MitsDocument.new)
# Feed the parser some XML
parser.parse(File.open("app/assets/xml/mits.xml"))
It's late, and I'm tired, so that might not be right, but it looks like the beginnings. You'll need to add code to process tracking the tags in your information
hash, but that will be similar to what's above. I'd also probably switch to using case/when
statements instead of lists of if
statements, to try to make the set/clear of flags a bit more clean, but, like I said, I'm tired so I won't bother right now.
On "real iron" vs. working on a virtual machine, you'd possibly be able to get enough RAM added to it to handle loading a 7M+ line XML file. Without the whole file I can't begin to guess how much RAM that'd take up in real life, but that's somewhat beside the point. SAX is designed to handle files of arbitrary size, since SAX processing really is breaking down the overall XML into smaller chunks you can more easily process.
DOM is convenient for most things; A lot of the time we see XML representing a single object, or a small extract from a database. I'm guessing you're dealing with a large, to huge, extract, or maybe even a complete database dump. DOM isn't really the tool to use in that case, but SAX is.
Having the capability in Nokogiri to handle both is the nice thing.