Short, kinda useful Python scripts

Written on February 8th, 2024 by sudoyashi

These are small utilities for future me to use when I want to do things the lazy way with scripts. As with all scripts, run at your own risk. :) I don’t develop or write Python for optimization; I write it enough that it works for my specific scenarios.

Scripts found here.

Create folders with whatever is in a .csv

Wherever the script is, create a file called directory.csv and hard write the values ‘CDs, software, drivers’ in rows. Then, read that .csv and create folders based on those values. To create other folders, just edit the list of names in directory.write

# Creates a list of directories if they do not exist. This creates
# the CDs, software, and drivers directories, as indicated by directory.csv
import os
import csv
from pathlib import Path

# create the file directory.csv

with open('directory.csv', 'w') as directory:
    directory.write('CDs,software,drivers')

# setup variable that loads all *.csv information
    
pathName = os.getcwd()
directoryFiles = os.listdir(pathName)
csvFiles = []
for file in directoryFiles:
    if file.endswith(".csv"):
        csvFiles.append(file)

with open('directory.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for row in reader:
        for column in row:
            print(column)
            p = pathName + '\\' + column
            Path(p).mkdir(parents=True, exist_ok=True)

Removing Title and Author metadata from .docx files

After saving a .docx file as an Adobe PDF, the pdf tab would show the document Title rather than the actual filename. This was a reoccuring issue because users like to copy and paste files for templates, so hundrerds of files had this Title misnomer over the years. One of our users thought this was a phishing attempt or virus. Fair enough. Thankfully, it was just a slight problem with metadata.

The script opens the current directory and checks all folders and files for .docx files. Then, we edit the properties to change the Author and Title. You can change any of the properties you need to, just modify the core_properties variable. The script does NOT cover .doc files; you would have to convert the .doc files into .docx.

# Open all the documents and check to see if the document has a title, if so, remove it.
# Open all the documents and change the Author
import os
import fnmatch
from docx import Document

# Select all the docx
cwd = os.getcwd()
for dirpath, dirs, files in os.walk(cwd):
    for filename in fnmatch.filter(files, '*.docx'):
        # use print if you want to see what files will be modified
        # print (os.path.join(dirpath, filename)) 
        editFile = (os.path.join(dirpath, filename))
        document = Document(editFile)
        core_properties = document.core_properties
        # If the author filed is not my email, change it
        if (core_properties.author != 'joshua...email'):
            core_properties.author = 'joshua...newEmail'
        # Check if title is not empty, make it empty
        if (core_properties.title != ''):
            core_properties.title = ''
        # Save the file with the original name
        document.save(editFile)
        print ('Saved ', editFile)	

You will have issues if these are server files!

If you’re modifying server files, like me, the packages will not load properly because of the UNC path. You need to add the package to the server’s local directory and then do a local import. The following submodules .py were affected and had to be modified:

api.py
phys_pkg.py
package.py
pkgreader.py

For each file above, append the following to the beginning list of imports:

# mymodule.py
import os
import sys
# Add the parent directory of mypackage to the Python path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

Then you can import the package using:

from docx import Document

Depending on the directory setup, it may take a couple of tries to read the errors. You may also have to replace any __ init__.py with empty files. For more information, read the issue on Attempted relative import with no known parent package.

Convert XML files to Excel (.xlsx)

Find all .xml files in the current working directory, convert them to .xlsx, and save the converted files in an existing folder called .xlsx.

Our VoIP phones use .xml files to create a directory; however, reading raw .xml is atrocious work. We can convert them to .xlsx for quick viewing to make sure there are no parsing or spelling errors.

# This script finds all files that end with .xml and puts them into a list. In that list,
# extract the xml value pairs, remove duplicates and write the pairs into sheet1 
# of a new Excel sheet within columns B:Z
# Finally, save the XML as an XLSX into the xlsx directory.

import xml.etree.ElementTree as ETree
import pandas as pd
import os
import pathlib
from os import listdir, path
from pathlib import Path

cwd = os.getcwd()
files = []
for file in listdir(cwd):
    if file.endswith('.xml'):
        files.append(path.join(cwd, file))

for xml in files:
    fileroot = Path(xml).stem
    tree = ETree.parse(xml)
    root = tree.getroot()
    valuePairs = []  # empty list assigned to A
    for elements in root:
        valuePair = {}  # empty array; store data in key:value pair
        for element in list(elements):
            valuePair.update({element.tag: element.text})  # updating dictionary with(tag -> Columns, text -> Rowdata)
            valuePairs.append(valuePair)  # Append key:value pair to A list
        df = pd.DataFrame(valuePairs)  # Create dataframe (df)
        df.drop_duplicates(keep='first', inplace=True)  # Only keep first, ignore all others
        df.reset_index(drop=True, inplace=True)
        writer = pd.ExcelWriter(cwd + '\\' + 'xlsx' + '\\' + fileroot + '.xlsx', engine='xlsxwriter')
        df.to_excel(writer, sheet_name='sheet1')
        worksheet = writer.sheets['sheet1']
        worksheet.set_column('B:Z', 30)  # Set column char to 30 for columns from B to Z
        writer._save()
    print(df)
    print('XML file has been parsed. Open at ' + fileroot + '.xlsx...')

Snakes are aight when they’re on your side

I am not trying to edit 300+ files within each directory to rename the file to something else. That’s why you make a script! I could optimize them, but at the tiny scale I have them, I don’t need to right now. Thank goodness for modern hardware.