Recently I found an interesting scenario, when I was doing data conversion to numeric in R.
Here is the scenario.
Step#1:: Created a simple data frame with 2 variables as shown below.
What I observed is, the values changed, when applied as.numeric function.
Analysis: It happened because the variable data type is a factor. Internally, factors are stored as integers with a table to give the factor level labels. So using as.numeric will only give the internal integer codes. To avoid this issue, we need to convert variable to character first, then apply as.numeric function.
Here is the scenario.
Step#1:: Created a simple data frame with 2 variables as shown below.
 a = as.factor(c(4, 9, 6))  
 b = as.factor(c(2.5, 3, 5.1))  
 df = data.frame(a, b)   
 str(df)  
Step#2:: Converted observations in existing variables to Numeric using as.numeric function, and stored the values in new variables.
 df$a_1 <- as.numeric(df$a)  
 df$b_1 <- as.numeric(df$b)  
What I observed is, the values changed, when applied as.numeric function.
Analysis: It happened because the variable data type is a factor. Internally, factors are stored as integers with a table to give the factor level labels. So using as.numeric will only give the internal integer codes. To avoid this issue, we need to convert variable to character first, then apply as.numeric function.
Step#3:: Converted observations in existing variables to Character using as.character, then applied as.numeric function, and stored the values in new variables.
 df$a_2 <- as.numeric(as.character(df$a))  
 df$b_2 <- as.numeric(as.character(df$b))  





No comments:
Post a Comment
Provide your thoughts !