Search code examples
rstatisticsstat

Plotting Liker Variables in R


I am dealing with a Dataset, that has a large number of variables about Survey of the people using Public Transport.

Attached dataset and csv file. Data Variables

Link to Data: https://drive.google.com/open?id=1MvfnwR4IkUyUzSnCuAOL8fxAYiDjBoBi

Code to read the dataset.

df = read.csv("PublicTransportSurvey.csv",sep=";", header = T, stringsAsFactors=TRUE)
# Display the dataset and obtain overall summary of the dataset 
df <- subset(df, select = -Row_Num)
View(df)

The variables can also be summarised as the following:

The items are Likert scale (1-5) with possible responses: strongly disagree (1), disagree (2), neutral (3), agree (3) and strongly agree (5).

Perceived Usefulness and Ease of Use
PU1: PT information is easily accessible
PU2: PT infrastructure is easily accessible 
PU3: The maps on PT infrastructure are helpful and clear
PU4: PT tickets are easy to purchase 
PU5: PT connections in Adelaide are well integrated
PU6: Waiting times for PT services are reasonable


Perceived Enjoyment 
ENJ1: The views from PT in Adelaide are scenic
ENJ2: Fellow passengers on PT in Adelaide are friendly 

Quality 
QU1: PT in Adelaide is reliable
QU2: PT in Adelaide supports disabled travellers
QU3: PT in Adelaide offers free wi-fi
QU4: PT in Adelaide has a low carbon footprint
QU5: PT in Adelaide is clean


Safety and Security
SS1: PT is safe in Adelaide
SS2: Adelaide PT drivers handle unruly passengers
SS3: PT shelters in Adelaide are well-lit at night-time 

Use Behaviour 
USE1: I use PT in the mornings only
USE2: I use PT during off-peak times
USE3: I use PT only during the evening
USE4: I use PT during the week
USE5: I use PT at the weekend


PT Incentives 
INC1: I use PT to save money
INC2: I use PT to protect the environment
INC3: I use PT to exercise more
INC4: I use PT to experience the city firsthand

Information Access
INF1: I access PT timetables and information using a mobile device 
INF2: I access PT timetables and information from a hotel concierge
INF3: I access PT timetables and information on the platform
INF4: I access PT timetables and information from a newsagency
INF5: I access PT timetables and information from other commuters

But, If we see the dataset variables picture that I have attached, it also contains some values beyond 1-5.

I am stuck in this problem from last 4 hours and trying to search.

My ultimate objective is to remove outliers from the above variables, (above 5) and then plot a likert plot. Please somebody suggest me, how to solve this.

Thanks in Advance.


Solution

  • My solution:

    library(likert)
    
    df <- read.csv2("PublicTransportSurvey.csv")
    
    df <- df[,12:54]
    df[sapply(df, is.factor)] <- lapply(df[sapply(df, is.factor)], function(x) as.numeric(as.character(x)))
    df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], function(x) as.numeric(as.character(x)))
    
    df <- data.frame(apply(df, 2, function(x) ifelse(x > 5, NA, x)))
    df <- data.frame(lapply(df, function(x) as.factor(x)))
    
    likert_df <- likert(df)
    plot(likert_df)
    

    First of all I've removed the columns which are not likert variables. Then I've converted the factor and character columns to numeric columns and replaced all values greather than 5 with NAs, because those are ignored by the likert package as far as I know.

    Then I've converted all the columns back to factors because that's required by the likert function. The code produces this image:

    enter image description here