Posted by & filed under gis.

Data coming in faster than expected, no rigid structure, struggle to remember what’s where and where to put new stuff. Before you know it you can’t find what you need. Sounds familiar? This post may help you out.


If you have encountered unexpected influx of all sorts of digital stuff yourself. You know that tracking it in a global spreadsheet will soon become waste of time since you (or other people) need to move folders and have no time to update the spreadsheet. Alternative solution is to stick a README.txt file containing a simple description in every folder, but then you loose the global overview. Or do you…?

Let’s combine the two! All you need is Python.

By the way, if you are a desktop Windows user and don’t have Python installed, ask yourself whether you actually need that PC at all.

I have written a Python script you can preview and download from github. With default settings, this script will walk through a directory tree from a specified root folder down and search for files called BROWSEME.html. It will parse each README.html file for content of a tags with id=”browseme” and id=”browsemetoo” and store the content of these tags in an output file (as two columns).

In the end, the output file is a table with three columns. Each row stores a path to one BROWSEME.html in column one, content of the tag with id=”browseme” in that file in column two, and content of the tag with id=”browsemetoo” in that file in column three. Open the output file, press Ctrl+F and type in what you’re looking for. If your file names and content of the tags you have been using are at least somewhat logical, you have a good chance to find what you need. Note that you canĀ  search for content of the BROWSME.html files using your operating system functionality, but if you have everything in one file, you can easily send it or put it on-line for somebody who wants to search but doesn’t have access to the full folder structure.

The BROWSEME.html file may look something like this:


<!DOCTYPE html>
<head><title>A Title</title></head>
<body>
<div id="browseme">This content will appear in the output file.</div>
<div id="browsemetoo">This content will appear as another column in the output file.</div>
</body>
</html>

 

See comments in the code for details of what parameters can be changed.

And why to use BROWSEME.html rather than README.txt? You can do well with both, but *.html file is more flexible. Open the output file in excel, use the HYPERLINK function, and you get a “clickable” table that can open any README.html in a web browser. The README.html can contain for example images and links to other stuff. Just imagine…

This solution is quick, lightweight, and simple enough to put in place if you don’t have time to set up and populate complex descriptive meatadata catalog.

It doesn’t get easier than that!
Currently the tag with the specified id cannot contain any inner tags. For example, if you use <div id="browseme">This content will appear in the output file.</div>, only "This content" will be retrieved.