Producing PDF and ZIM locally
Jump to navigation
Jump to search
https://github.com/ajithhub/education-justice-project
At the moment there is a problem using pediapress to produce the zim and podf files.
For now experimenting with doing it myself
virtualenv-2.6 mwlibenv source mwlibenv/bin/activate pip install -i http://pypi.pediapress.com/simple/ mwlib pip install -i http://pypi.pediapress.com/simple/ pil pip install -i http://pypi.pediapress.com/simple/ mwlib.rl pip install -i http://pypi.pediapress.com/simple/ mwlib.zim # this on is for table of contents preparation sudo yum install pdftk # this one is to silence the warnings about RTL pip install -i http://pypi.pediapress.com/simple/ pyfribidi
mw-zip -c http://ejpdocs.cuforrent.com/ -o test2.zip "Prepare Bootable EJP Recovery USB Drive" "Entering the BIOS Setup" "Configure HP Integrated Lights Out" "Security Considerations" "Windows Multipoint Installation" "Change Unidentified Networks to Private" "VSpace Server Installation"
mw-render -c test2.zip -o test2.2.pdf -w rl
mw-render -c test2.zip -o test2.2.zim -w zim
cp *.{pdf,zim} /u/aantony/public_html/wikibooks/
chmod a+r /u/aantony/public_html/wikibooks/*
The problem was query string length
The default dreamhost max get was set to 512, which was too short for the media wiki api calls that fetch image metadata. These ended up being pretty large. managed to override the value, so we can use pedia press again
$ cat - > ~/.php/5.3/phprc [suhosin] # 3/10/2013 override get max for mediawiki api suhosin.get.max_value_length = 10000
# screw it just turn suhosin off suhosin.simulation = On
$ killall php53.cgi
Throttle requests
Presently hosting this wiki on a Dreamhost shared account, which is not supper fast. By default the image threadpool is 10, but we'll need to turn it down to 1 to be more reliable:
./mwlibenv/lib/python2.6/site-packages/mwlib/net/fetch.py < self.image_download_pool = gevent.pool.Pool(10) > self.image_download_pool = gevent.pool.Pool(1)
Also for some reason, lots fo requests get 500 status. It appears that merely retrying once is sufficient to fix them.
./mwlibenv/lib/python2.6/site-packages/mwlib/net/sapi.py
def _fetch(self, url):$
try:$
f = self.opener.open(url)$
except Exception as e:$
print "ERROR in %s" % url$
print "Try again...."$
try:$
f = self.opener.open(url)$
except Exception as e:$
print "ERROR twice in %s" % url$
raise(e)$