PDF Creation and Manipulation

Ghostscript

unpaper

gs -r300 -dBATCH -sDEVICE=pgmraw -sOutputFile=page-%03d.pgm -dNOPAUSE input.pdf

(or for PNG files, use png16m)

unpaper --layout double --input-pages 1 --output-pages 2 --no-mask-scan
--no-border-scan --border 30,30,30,30 --deskew-scan-deviation 2
--middle-wipe 80 --sheet-size 3312,2562 page-%03d.pgm unpaper-%03d.pgm

for i in unpaper-*.pgm; do pnmtotiff $i > $i.tiff; echo $i; done
tiffcp -c zip *.tiff big.tiff

tiff2pdf -z -t"Title goes here" -a"Authors" -s"Subject" -k"Keywords" -o
big.pdf big.tiff

pdfoutline big.pdf outline.txt big-with-outline.pdf

Scan Post-Processing

Current workflow:

gs -r300 -dBATCH -sDEVICE=png16m -sOutputFile=page-%03d.png -dNOPAUSE input.pdf
for f in page-*.png; do convert -colorspace gray -level 0,80% $f bw/$f; echo $f; done
for f in page-*.png; do convert $f -colorspace gray -auto-level -threshold "90%" thresh-90/$f; echo $f; done
# Reducing to B&W and tweaking the brightness and contrast
for f in page*.png; do convert -colorspace gray -level 0,80% $f 1-$f; echo $f; done
for f in page-*.png; do echo $f; convert -brightness-contrast -15x25 -level 0,80% $f _scans/$f; done

# convert input images to colour-corrected greyscale images with an alpha channel (levels better for text)
for f in ALC-???.png; do convert -colorspace gray -level 0,90%,0.25 -alpha Set $f gscc-0-90-0.25-$f; done

# convert input images to colour-corrected greyscale images with an alpha channel (levels better for photos)
for f in ALC-???.png; do convert -colorspace gray -level 0,90% -alpha Set $f gscc-0-90-$f; done

# edit the text files to create transparent mattes where the too-dark-photos are

# merge them
for f in ALC-???.png; do convert gscc-0-90-$f gscc-0-90-0.25-$f -composite output-$f; done



# subtract common backgrounds from images
for f in ??.png; do convert -composite -compose difference $f background.png -negate neg-$f; done

# invert the negative images (not sure why the -negate above doesn't work)
for f in ??.png; do convert neg-$f -negate output-$f; done


# Compositing in a mask
composite -compose Dst_Over page.png mask.png output.png


# apply mask
for (( p=0; p<10; p++)); do composite -compose Dst_Over ../PADM-1\ $p.png ../PADM\ Mask.png PADM\ 00$p-1.png; done

# increase contrast
for f in page-*.png; do echo $f; convert -brightness-contrast -15x25 -level 0,80% $f done/${f%.png}.tiff; done

# composite greyscale / colour masked images
for f in *.tiff; do composite -compose Dst_Over ../$f $f composited/$f; done


# Converting / remapping colours to an input palette file
convert input.png +dither -remap palette.png out.png

OCR

Then run the PDF file through something like Acrobat Professional to add non-destructive OCR layer under bitmap image (apparently, pdfocr or gscan2pdf can also be used to embed searchable text layers into scanned PDF files)