Pages

Thursday, November 9, 2023

convert pdf to plain text

I wanted to spice up my typing practise on monkeytype by using pdfs.
heres a python script that takes the file paths as inputs and spits out a plain text version you can copy into monkeytype.

import PyPDF2

def pdf_to_text(pdf_path, output_text_path):
    text = ""
    
    with open(pdf_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()
    
    with open(output_text_path, 'w', encoding='utf-8') as output_file:
        output_file.write(text)

# Take input for the PDF file and output text file
pdf_path = input("Enter the path to the PDF file: ")
output_text_path = input("Enter the path for the output text file: ")

pdf_to_text(pdf_path, output_text_path)

print(f'Text extracted from PDF and saved to {output_text_path}')

okay so after i did that i found an easier way:
using some free software called calibre this can convert ebook formats from CLI but its giving me some troubles il have to come back to this later.

  1. I set an alias for ebook-convert
  2. trying to get a bash script to take the input args as variables for the input and output paths.

No comments:

Post a Comment