Posted by & filed under computing.

Please read these valuable tips on how to work with spreadsheets effectively.

Why are spreadsheets so hated and praised at the same time? Everybody loves spreadsheets for their flexibility. However, this flexibility can turn a spreadsheet that’s perfectly clear to you into something very difficult to query when it grows beyond the size acceptable for searching in it by human eyes and you need an analyst or programmer to help you. Programmers want to know whether values in a column are numbers, text, date, and so on, which is not clear with spreadsheets. Often, analysts need to import the data into some software other than Excel (which generally involves OLE DB or ODBC technologies), and requires a nicely formatted spreadsheet. Here are six critical issues to look out for.

  1. Folder, file, sheet, or column names contain non-alphanumeric characters
  2. Multiple tables on one sheet or one table split to multiple sheets
  3. Cell style indicates value
  4. Mixed data types in one column
  5. Merged cells and fancy headers
  6. Inconsistent identifiers

1) Folder, file, sheet, or column names contain non-alphanumeric characters
Description: Non-alphanumeric character is anything else then a letter or a number. Strings like “£ spent in 2013”, “J.Smith”, or “Tick & Tack – Toe” are all wrong. Note that space ” ” is non-alphanumeric character so avoid using it too!
Why is it a problem? Non-alphanumeric characters are generally not allowed in column headings in database systems and can lead to errors when importing or analysing data.
Resolution: Use only letters, numbers, and potentially underscores (preferably not it column headings) in folder names, file names, sheet names, and column names! Start with a letter, and keep it short. The examples above can be changed to “GBP_spent_2013”, “J_Smith”, “TickTackToe”.
2) Multiple tables on one sheet or one table split to multiple sheets
Description: It has to do with database design and the way we decompose real world into a set of tables. In any case, one sheet should only have one table on it. Say you keep records of parcels sent out to customers. It is a good idea to have one sheet called “parcels” to record properties of parcels (trackID, weight, colour, etc.), and one sheet called “customers” to record properties of customers (customerID, Name, Address, etc.). The parcels sheet will also have a column called for example cutomerFK where you need to record corresponding custormerID from the customers sheet. Now say you started this spreadsheet in November 2012. Many people would add extra two sheets in January 2013 to continue keeping records in 2013. There is no need to add new sheets! Just add an extra column called for example “yearShipped” into your parcels table to record the year when each parcel was shipped.
Why is it a problem? Imagine, in the example above, that you add new sheets every year. One day, your manager needs to know which customers were sent parcels heavier than 1 kg regardless of the year. You would have to go to individual sheets (instead of just one) to find out. While the delay may seem negligible in this trivial example, violating the concept can lead to significant delays in real situations.
Resolution: To decide whether to put bits of information into one or split it into more sheets, picture the real objects you keep records for and try to group them into classes. The classes you see most clearly probably deserve they own sheets (i.e. tables). One row in a table will represent one real object and one column will record one property of that object. To decide whether two objects should be in different tables, ask yourself: “Have these two objects different properties?”, in other words “Do I need the same columns for these two objects?” If your answer is “No” or “Just a little”, your best bet is to stick it into one table. If your answer is “Yes” or “Pretty much”, put them into separate tables. It is difficult to see optimal split of your data into tables, especially if you start keeping records more or less spontaneously, so you may need to merge or split your sheets as you workbook grows and as the bigger picture becomes clearer. A helpful introduction to database design by Microsoft is Database Design Basics.
3) Cell style indicates value
Description: We all know that; green, amber, red. Colours definitely help to make things stand out and there is nothing wrong with it. However, …
Why is it a problem? Information about cell style (colour, border, font, etc.) is usually lost when the table is imported to some other software. Then there is no way of telling what the value was.
Resolution: Enter actual values into cells (e.g. type in “G”, “A”, “R” for green, amber, and red, respectively) and use conditional formatting to change cell style.
4) Mixed data types in one column
Description: Values like 10, 20, 10-20, 10, Unknown, £10 in one column.
Why is it a problem? Imagine you want total sum of the values in the description. Such column would be imported as text so no sum in the usual sense would exist.
Resolution: If you want to use numerical functions on a column, enter only numbers into that column. Use quantities in the same units and don’t enter units into cells. The example from the description could be solved by having column minGBP (values 10, 20, 10, 10, , 10) and maxGBP (values 10, 20, 20, 10, , 10). It can get complicated when you need to account for values like “Unknown”, “N/A”, “NoData”, etc. Sometimes values like -9999 are used to indicate missing data, but this obviously affects your summaries. Bear in mind that if multiple rows at the top of your sheet have blank cells in one column, the whole column may be imported to other software as text, not as a number. It is good practice to specify the format of cells in a column (in MS Excel, right-click on the column header, go to Format Cells…, Number tab, and define the desired format).
5) Merged cells and fancy headers
Description: Cells merged to indicate that one value applies to more than one columns or rows.
Why is it a problem? Merged cells may affect summaries across columns or rows. Furthermore, when importing to another software, only one cell may contain a value, the rest of cells may be blank!
Resolution: Do not merge cells! Merged cells in the body of a table are unforgivable. Sometimes it is useful to merge cells in complex table headers, but this often makes it impossible to directly import the sheet into another software. From data manipulation perspective, table header should be a single row with as many cells as there are columns in the table, each cell should have a unique name compliant with point 1. If you need a complex header with merged cells, add an extra header row with no merged cells (and with unique values) before the first data row, the analyst-friendly table can be then outlined using named range.
6) Inconsistent identifiers
Description: Values that identify records in one table differ from values in related tables. In the example with parcels and customers in point 2, values in column customerID in customers table must match values in customerFK in the related rows in parcels table, otherwise there is no way of telling which parcel was sent to which customer.
Why is it a problem? It is obvious. If you want to link two tables based on a column, the values in that columns must match for related rows. This fundamental condition surprisingly often does not hold and is generally very time consuming to fix.
Resolution: There is no resolution other than being more pedantic. Instead of typing values into cells, copy and paste them if possible. Make sure each row in each table has one column (or a set of columns) that uniquely identify the row and values match corresponding values in related tables. Cases when the differences are systematic (e.g. “ID001” in one table links to “001” in another table) can be solved quickly, but non-systematic differences usually require manual editing.

If you follow recommendations outlined in this post, you are on your way from being enslaved by your spreadsheets to let your spreadsheets work for you. However, there are many intricacies to storing data in spreadsheets and you may run into issues specific for your situation. My final recommendation is to adhere to one key principle: separate data from data representation.