Showing posts with label cli. Show all posts
Showing posts with label cli. Show all posts

Tuesday, May 18, 2010

Convert .html to .pdf in gnu/linux

There are various options for converting .html files to .pdf in a gnu/linux operating system. Your choice of methods will depend on the complexity of the file you wish to convert, and your familiarity with the tools a gnu/linux system provides.

What you'll need:

  • Gnu/linux operating system

  • Html file

  • Web browser


Optional:

  • Openoffice.org office suite

  • wget

  • html2ps

  • ps2pdf


Simply "Print to file"
One very simple option for creating a .pdf file from an .html file is to simply open the file in your browser, and choose, print. When the print dialog arises, choose "Print to File", and indicate "PDF". This will write the html file out to pdf format.
html to pdf conversion: print to file

Here is a pdf of this article generated in this fashion: converthtml2pdfgnulinux.pdf

OpenOffice.org

"Print to File" works well for basic html files with simple text and some images. If the html file in question has more complex formatting, this option may not always produce the best results. Luckily, other options exist.

Save the html file to your computer (if you haven't already done so), and open it with OpenOffice.org's html editor (ooweb). Then simply go to the "File" menu, and choose "Export". OpenOffice.org will then offer you the usual options for saving a file, such as choosing where to save it, and what title to give the file, and, preso-magico, will produce a .pdf file from your .html file.

Command Line

Of course, no linux how to article would be complete without instructions on how to accomplish your task using only the magical Bash command line interface. For those so inclined, then, the following is a complete process for acquiring an .html file and converting it to a .pdf file. In order to proceed with this method, the following software must be installed on the your computer: wget, html2ps, and ps2pdf. These programs are either already a part of most gnu/linux distributions, by default, or can be easily acquired with your favorite package manager (apt, yum, pacman, portage, etc.)

First, let's save the file to your computer:
wget http://www.somesite.com/yourfile.html

Next, let's convert the .html file to a postscript or .ps file:
html2ps yourfile.html > yourfile.ps

Then, we'll convert the postscript file, finally, to a .pdf file:
ps2pdf yourfile.ps

Voila!
You should now have "yourfile.pdf".

This could, of course, all be scripted.

#!/bin/bash

# convert webpages to pdf files
# get url
echo "Enter the url of the page to be converted:"
read page
#download page
wget $page

file=$(basename $page)
#convert to postscript
html2ps $file > $file.ps
#convert to pdf
ps2pdf $file.ps
#clean up extraneous files
rm -f $file
rm -f $file.ps
#clean up file name
rename "s/.html.pdf/.pdf/g" *.pdf

echo "done"

exit


Here is a pdf of this article, generated via this command line method: convertweb2pdflinux.pdf
Notice, it is different from the above pdf created with "Print to file".
One difference, which, depending on your goals, may be either advantageous or undesired, is that text in this file can be selected and copied, which is not true of the first file.

XHTML2PDF

In many cases, you may wish to create a pdf file from a complex .html or .xhtml file that includes .css (cascading style sheet) or other elements, that will not render in the above methods in such a manner as to produce a file that appears as it does on the Internet.

For those cases, there is a program called xhtml2pdf. This program is not as likely to be a part of most gnu/linux distributions by default, nor available from said distributions' repositories. As such, you may to have to download and install it by hand. Thankfully, the site for this program is easily enough found at http://www.xhtml2pdf.com/, and, of course, the program is free, open source software.

And, of course, here is a pdf of this article generated with xhtml2pdf: xhtml2pdfconversion.pdf

There's more!

Yet other methods exist for generating .pdf file from .html files, of course, and an attempt to compile an exhaustive list, with instructions for each, would be beyond the scope of this article.

Tuesday, February 16, 2010

a thousand fireflies, lighting up the cat5 wires (or, my scattered and befuddled web presence becomes yet more obfuscated and confusing...)

I've been busy for this past week, slaving over the translation of some academic papers from Brazil.
Nonetheless, I have been frequently distracted by hackery and geekery, as is not uncommon in my doings.As such, I did want to share with you a couple of links.
First, I have decided to start posting my poetry online.I'm not going to make this a poetry blog...no way...and, I've even decided that a blog is decidedly NOT the best way to publish my poetry.
Instead, I have made a wiki for this purpose. I can add poetry at my leisure, and they will not appear chronologically, as they would in a blog, but, rather, will have each their own page.
Also, being a wiki, it is easier for me to separate the poetry into sections by language (since I write poetry in 3 or 4 languages, anymore).
The wiki is also simple and easy to maintain.
Oh, yes, you can find it here: tony baldwin | poetry.
That was fun.

The idea came as I began introducing my kid to the world of wiki-ing, in an attempt to being to educate her on how to master and conquer the internets, to scratch out her domain there, knowing that in the future, our entire lives will be spent online...(future? gosh, I've been living a virtual life for over a decade, and no IRL life to speak of...)
Well, I figured learning wiki-code would be useful, and will set us up for the next step, as I teach her to write html and css and develop her own site, thus effectively closing the casket on any real social aspirations she may have been harboring, while, hopefully, planting the seed for future, marketable skills. (Next I'll start her on php, perl, python scripting, and, before you know it, we'll conquer the internets and be on our way to world domination. Never say I didn't warn you...)
I've been giving her little assignments for pages or sections to build into her wiki, mostly just to annoy her, since she's on February break this week. (Not like I want her doing fun things, like playing in the snow, or talking on the phone...she'll never learn to hack the fed and move us some funds around doing that stuff...)

Now, I have also created a one page mini profile, as a sort of catch-all basin and minirepository of data on all that is tony baldwin here: tonybaldwin.info.
You know, because, I just know that all of my adoring fans out there have all been dying for some consolidation of the scattered, nebulous universe that is tony baldwin on the internets (self absorbed much?).

So, I've spent more time on livejournal than I have in some time. I have to say, I miss the days where LJ was the axis of my internet social experience. I believe that a social networking site centered on blogging gave greater depth to "social networking" on the internet than sites that impose 140 character limits, or focus more on animated fish tanks than on producing profound discourse and promoting sincere self-expression. As such, I've been ignoring the visage tome and interacting more with LJ, again. This had me thinking about blogging, not surprisingly, and, consequently, about creating a blog of my own, on my own server, etc., as opposed to using someone else's free blogging service (ie. LJ or blogger, etc.). This led me to exploring nanoblogger, a nifty little command line tool for creating, editing, and managing a blog from the bash command line! Very nifty.
I create/edit/manage a blog on my local machine with this tool, on the command line, and then just ftp the whole dir full o' stuff up onto my server, which I've done, thus creating the baldwinsoftware.com / tony baldwin / nanoblog.
I figure that one will be primarily used to document my adventures in hackery and geekery (similar to the tonytraductor livejournal).
With nanoblog, of course, I can make numerous such blogs and load them to my server. I may yet find other uses for that. The nanoblog, unfortunately, doesn't lend itself to community blogging, however, since I can only manage it on my machine and load to the remote host, and it doesn't feature commenting or other feedback/participating/community resources. It does enable both rss and atom feeds, though, so my nanoblog can be followed on google reader or other rss/atom aggregating appartuses.

Of course, at this juncture, I have a completely schizophrenic web presence, with numerous blogs and community profiles on countless sites, some highlighting my artistic whack-a-doodle-ry, others empahsizing my hackery-geekery, others focusing on my professional pursuits, while still yet others, such as this here blogspot, are of a more general and scattered nature, and, yet others, are far more personal.
I question, sometimes, whether it is best to have so fractured a web presence, while at other times I question the value in having any generalized outlets (such as this one), and wonder if I shouldn't do a better job of organizing my distinct efforts in various diverse fields to develop each more specifically, yet more fully (i.e., perpetuate the scattered presence, and fully explore each of those avenues, being 1) art/poetry/music, 2) geekery and hackery, and 3) professional matters pertaining to the international market and translation industry).
In any case, it seems that each of those areas is more likely to find an audience than just the befuddled and scattered ramblings I post here, and, that if I wanted to truly develop such audiences, I should step up efforts in 1 or more of those fields and be more conscientious about maintaining whatever efforts I realize in those specific fields (in others words, post regularly on the specific blogs).

What do you think?

Well, with that, me dr00gies, I must get back to slaving away at these Brazilian academic documents.