Step by Step: How to Prepare A Test Data Set in Excel?

Efe Buyuk
4 min readOct 15, 2020
Photo by Mika Baumeister on Unsplash

In this section, we are going to concentrate on preparing a test data set in Excel. As you know, in this data validation series we have said that we will be using Excel-XML-XSD-VBA, so called the Holy Rectangle. For the sake of creating an example for this, we started by Excel, it’s very basic definition and how to use it in data validation. These were from the previous section. If you haven’t already read it, you can do so by clicking this link.

What Is A Test Data Set?

First things first and we will be doing another basic definition for this title. As the title sounds very familiar, we can say that a test data set is a data set where we can do whatever want on it. Because of its test nature, we are relief that there will be no harm adding, deleting or updating data in that set. In test data sets, we want to create a data set which is almost the same with the original data set as its structures. What do we mean by almost same? Let me show you. For example, our original data set looks like the one which is shown below;

How to Prepare A Test Data Set?

Above table shows the original data set and it is called EMPLOYEE table and we created it in Excel. When we want to create a test data set, we don’t always want all the columns for our testing purposes. For example, we may want only EMP_ID, EMP_NAME, MANAGER_ID, SALARY and DEP_ID columns because we may want to test integer data from the original data set and we may need EMP_NAME field for checking purposes. So, according to this simple plan, we create our test data set. When creating a test data set, you will generally have three options.

  1. Copy all data along with the columns you want to your test table.
  2. Copy only the column names (we also call them as features) and generate your own data for these columns.
  3. Pick a middle way and for some columns you copy all data and for other columns you generate your own data.

Of course, we know that copying directly from the original data set it much easier but there are reasons why we generate our own data. One of the most important reasons for is data security. Maybe you need to send your work outside of your company and that is because you should keep your original data private. In this type of a scenario, we generate our own data.

In above simple scenario, we can keep integer fields as they are because they don’t mean anything without real employee names, and we need to change EMP_NAME field’s values for the sake of security reasons. So, in this case, we are applying a hybrid solution which is number 3 in our above list. Here’s our test data set’s fist version;

Basically, first we copied and pasted all the data, then we deleted unnecessary columns for our case. Second, we deleted EMP_NAME column’s values so that we can generate our own values instead.

In order to create our own values, again we have some options.

  1. If you have a little amount of data (as in our scenario now), then you are lucky. You can just type your own values. Unfortunately, that isn’t always the case.
  2. If you have a huge amount of data (which probably you will), then you need to use Excel’s properties to complete your task.

However, we will use Excel’s properties to complete our task because otherwise you already know what to do. In our case, we are going to use the formula below;

=CHAR(RANDBETWEEN(65,90))

In the above formula, RANDBETWEEN function will pick a random number between 65 and 90. After that, CHAR function will show the upper-case letter which is selected as its number correspondence. We will not go into more detail about Excel’s formulas and functions but if want to learn more about them, you can always check Microsoft’s support pages about it. Here is the link for formulas and functions.

After applying above formula for our column EMP_NAME, we have the below table.

As you can see now that there are random characters as values of EMP_NAME and that solves our problem with privacy.

In this section, we saw how to prepare a test data set in Excel step by step. Of course, there are other ways to prepare a test data set and we should be aware of our technology ecosystem (such as where will we copy our original data from, etc.) before getting started for this type work.

Note: You can also check this blog post and some other interesting topics on my personal web page.

Next: What Is an XML File? How Do We Use It in Data Validation?

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Efe Buyuk
Efe Buyuk

Written by Efe Buyuk

Use all your data to make the right decision at the right time, every time. Follow me on LinkedIn https://www.linkedin.com/in/efebuyuk/

No responses yet

Write a response