Goal
The goal of this lab was to learn how to geocode by geocoding the locations of sand mines in Wisconsin and comparing the results to the actual locations of the mines.
The data used for this lab came from the Wisconsin DNR, but the data was incredibly messy. There were different types of street addresses and PLSS addresses. Some locations had both types while some only had one or the other. We were all assigned 16 mines out of 129 and were to normalize the data. When the normalization was done, the table was then imported into the geocoding toolbar in ArcMap. After the 16 mines were geocoded, the actual mine locations were imported and compared to the ones we had geocoded. We also had to compare our personal 16 mines with those of our classmates who had the same mines. The final step was to create an error table in order to see how well we had geocoded our mines compared to where the actual mine locations were.
Methods
The first step to this lab was to figure out which 16 mines were ours. We then had to create a new table with normalized data in order for it to be imported into ArcMap. My 16 mines had their data entered quite differently so normalization was definitely needed. Below in Figure 1.1 is how I received the data. Figure 1.2 is the data after I had normalized it.
Figure 1.1: How I received the data from the DNR. |
Figure 1.2: Normalized data for my 16 mines. |
After normalizing my data it was time to geocode the addresses in ArcMap. I entered the table that I wanted geocoded and 11 of my addresses were matched, but the other 5 were unmatched. I added the points to the map along with an imagery base map in order to make it easier to see if the mines were in the correct locations. I went through the 11 matched addresses first in order to make sure they were matched to the correct addresses. In order to do this, I zoomed into each matched location to see if I could see the sand mine in that general area. For the most of them the address was right and I could see the sand mine on the base map. There were a few, though, that were in completely wrong areas. In order to fix these I had to look at their PLSS addresses. I added the Wisconsin townships and sections layers in order to better locate where these sand mines were. For the 5 that were unmatched I also had to look at the PLSS address. These sand mines either did not have a street address or their PLSS address was not normalized enough for ArcMap to figure it out. It took quite a bit of time of struggling to estimate where these mines were, but I eventually geocoded all 16 mines.
Next I need to compare my personal geocoded mines with their actual locations and with my classmates geocoded mines. The actual locations of the mines layer imported with all 129 mines, though, so I needed to select just my 16 in order to get a better understanding of where my mines actually were. I created a new layer for the actual locations and did the same with my classmates who geocoded the same mines as I did. I compared my mines with the actual locations in Figure 2.1 below. In Figure 2.2 I compared my geocoded mines with my classmates mines.
Figure 2.1: My geocoded mines compared to their actual locations. |
Figure 2.1: My geocoded mines compared to my classmates geocoded mines. |
For the most part, most of my mines were in the right location compared to their actual locations and to my classmates mines. I created an error table comparing how off I was in meters compared to the actual locations and classmates locations. Only about three were completely off, which was my error. For the rest, they are all in the same location. The numbers may seem to look like I was completely off, but the scenario was usually that I was in the right location, but I just put my point down the street or a few meters away from where the DNR and my classmates had put theirs. If the numbers are below 2,500 meters it means that I had the right location, but the points are in different spots. The table is pictured below in Figure 3.
Figure 3: Meters my mines were off from actual locations and classmates mines. |
Discussion
When I first received the data, I could tell there were many errors with all of the mines in the table. There are three types of errors that can happen with data: gross errors, systematic errors, and random errors. The first type of error that I saw right away with this data was gross error. These are just mistakes or oversights that may have happened. These can be fixed by properly training employees and those collecting the data that the information should be collected in a standardized procedure. It's obvious that the data was not collected in this manner. Systematic errors weren't as prominent as gross errors. This type of error is typically from instruments not being properly calibrated and from changing environmental conditions. These were a possibility, but were not seen in the data. Random errors are the leftover ones that don't fit into the first two categories. All data has random errors, but they are often small errors and can be easily fixed. I believe these errors were either fixed before the data was received or I just didn't notice them.
There is also inherent and operational errors. Inherent errors occur because real world phenomena cannot be accurately represented in data and models. They are generalized and are sometimes incomplete. The real world is too complicated to represent in data and modeling. Operational errors occur when the data is actually being collected. This is also known as user error. These two types of error occur in all data which means these are present in the data used for this lab. This leads to the question, how do we know what is accurate data and what is not? By following set rules and guidelines while collecting and processing data we can ensure that our data will keep it's integrity. If all data were to be collected and dispersed like the DNR mine data, then many errors would have to be dealt with and much more work would be needed in order to create accurate results. By following standardized procedures, we can skip unnecessary errors and collect and produce accurate data.
Conclusions
This lab was quite frustrating, but in the end I learned the importance of collecting standardized data. Collecting data that is organized and understandable is vital in producing accurate findings. We can't use data that is full of errors and expect to get reliable results. It is very important to be vigilant throughout the whole scientific process.
Sources
http://resources.arcgis.com/en/help/
Lo, C.P., and Albert K. W. Yeung. "Chapter 4: Data Quality and Data Standards." Concepts and Techniques of Geographic Information Systems. Upper Saddle River, NJ: Pearson Prentice Hall, 2003. 103-134. Print.
No comments:
Post a Comment