Tuesday, May 5, 2020
Analysis of the BLITZ Employee Survey Data Set
Question: Scenario We continue with the analysis of the BLITZ employee survey data set. As introduced before BLITZ is a department store which employees over 3,000 people. Management is concerned about the wide variation in productivity between employees and whether employee job satisfaction and their sense of job security plays any part in this problem. The organisation also has staff planning issues that need addressing, including planning to replace workers who retire, undertaking a recruitment drive for new employees and providing adequate promotion opportunities within the company. Also, Management is concerned about the wide variation in productivity between employees, and suspects the amount of unpaid overtime hours worked may play a part in this problem. Management has asked you to investigate, and if their suspicions are realised, to develop a model to help predict productivity based on the number of unpaid overtime hours worked. Q1. Is there a relationship between JOB SECURITY and PRODUCTIVITY? In this exercise we are investigating the existence of a relationship between productivity and job security. Before checking for an existence of relationship, we need to establish the type of variables we are dealing with and which variable is the dependent variable out of the two variables. What types of variables are Job Security and Productivity? Which of the two is the dependent variable for this exercise? Interpret the following output: a) With the following table of comparative summary measures, highlight the major differences in the summary measures of productivity across the three different levels of job security. Productivity Insecure Productivity Secure Productivity Very secure Mean 94.22 98.80 103.59 Standard Error 0.49 0.56 0.85 Median 95 98.15 104.15 Mode 95 98 107 Standard Deviation 1.84 2.50 3.19 Sample Variance 3.38 6.24 10.15 Kurtosis -1.31 -0.48 1.41 Skewness -0.42 0.17 -1.36 Range 5.3 8.9 10.4 Minimum 91.2 94.3 96.6 Maximum 96.5 103.2 107 Sum 1319.1 1975.9 1450.3 Count 14 20 14 Q1 92.63 97.50 102.98 Q3 95.40 101.08 105.70 IQR 2.78 3.58 2.72 b) With the following multiple box plot of productivity vs. job security, give a brief description of the major differences in productivity across the three groups of workers. (c) Based on what you have found in (a) and (b), do you think there is a relationship between productivity and job security? Q2. Is there a relationship between GENDER and PRODUCTIVITY? This time you will produce the statistical output comparing the productivity of females and males. a) Generate comparative summary measures of productivity for females and males. In Excel we first need to split the productivity data into two groups (for females and males). To do so, go to the BLITZ_EmployeeSurvey_Data worksheet, select any cell in row 1, press the Data tab, and then press the Filter button . In cell B1, click on the drop down box and select only Female employees (see Figure 1a). Next copy all female productivity figures in column D and paste into the Working worksheet in cell C7. Repeat for male productivity figures and paste in cell D7. While still in the Workings worksheet, click on the Data tab and then press Data Analysis. From the pop-up window select Descriptive Statistics and press OK. Enter the Input Range, Output Range, check Labels in First Row, and select Summary Statistics as shown in Figure 1b. Press OK. Underneath your summary measures table, calculate the 1st quartile, 3rd quartile and IQR in rows 22 24. See tutorial 3, step 3(a) for formulas if necessary. (b) In your spreadsheet, highlight the major differences in the summary measures of productivity across females and males. (c) Draw a multiple box plot of productivity vs. gender (by hand). (d) Based on your answers from (a) (c), is there a relationship between productivity and gender? Q3. Cross Tabulations - Productivity vs. Department Produce a frequency cross-tabulation using Microsoft Excel. While in the Workings worksheet, select the Insert tab and click on Pivot Table. From the pop-up window, enter the Table/Range and Location as shown in Figure 2(a).This creates a blank cross-tabulation (pivot table). From the Pivot Table Field List: click-and-drag the Productivity variable to the column fields of the table. click-and-drag the Department variable to the row fields of the table. click-and-drag the Employee variable to the data items of the table. Next we need to group the productivity figures into classes. To do so: right-click anywhere in the productivity column and select Group (see Figure 2b) In the pop-up window, set the classes as shown in Figure 2c. Press OK (b) Produce a row percentage cross-tabulation using Excel. Right-click anywhere in the body of your pivot table, select Summarise Data By, select More options (see Figure 3a). Select the Show values as tab, choose % of row and then press OK (see Figure 3b). (c) Which department appears to be performing the worst in terms of employees meeting their productivity targets? Q4. Scatter diagrams and Correlation analysis Draw a scatter diagram of Productivity vs. Age In Excel, it is easiest to plot a scatter diagram if you have the two variables beside each other. As such, copy the Days Absent data from the BLITZ_EmployeeSurvey_Data worksheet (column L) and paste to the Workings sheet (in cell C76). Similarly, copy and paste the Productivity data to cell D76 on the Workings sheet. Highlight the data on the Workings worksheet. From the Excel menu, select the Insert tab and then choose Scatter (see Figure 4). Tidy up your diagram by adding labels to both axes (make sure the diagram is selected, from the Chart Tools menu item, choose the Layout tab, then select the appropriate options). Draw a scatter diagram of Productivity vs. DaysAbsent From your scatter diagram in part (b), describe the relationship (if any) between Productivity and DaysAbsent. Add a trendline to your scatter diagram. Choose Chart Tools, Layout tab, then select Trendline | More Trendline Options (see Figure 5a). Experiment with different trendlines (exponential, linear, etc). Make sure you check the Display Equation and Display R-squared checkboxes as shown in Figure 5b. Which trendline gives the best fit to the data? From your scatter diagram in parts (d) and (e), describe the relationship (if any) between Productivity and DaysAbsent. Q5. Regression Analysis (a) Build a regression model between Productivity and Unpaid Overtime. Make sure you are on the BLITZ_CustomerSurvey_Data worksheet. From the Data tab, select Data Analysis and then choose Regression. The pop-up window in Figure 1 will appear. Enter the Y-Range, X-Range, select the Labels checkbox and Output Range as shown in Figure 1. Press OK. (b) Explain, in practical terms, the values of b0 and b1 in the above equation. From your regression output, write down the practical interpretations of for R2 and syx. Q6. Residual Analysis (a) Check the model you created in question 3 does not violate any of the underlying assumptions of regression by doing a residual plot(s). Make sure you are on the BLITZ_CustomerSurvey_Data worksheet. Repeat the steps from Q3(a). This time set the Output Range to Working!$B$4 and select the Residuals, Standardized Residuals and Residual Plots checkboxes (see Figure 2). (b) Interpret the residual plot. (c) Are there any outliers (influential values)? Answers: Purpose The research study is conducted to analyze situations regarding Blitz departmental store with respect to different parameters. As the department has a large number of employees there are lot of scenarios like job satisfaction, job security, retirement, productivity e.t.c. concerning the employees that needed careful intervention. This played a major role behind conduction of the study as for proper functioning of the whole department all the issues must be in control. Therefore to identify the issues and also the positive factors was the objective of the analysis. The analysis is conducted in three parts a explorative summary data analysis, the prediction analysis and time series analysis. Through these analysis a complete vision of the scenario of the department can be judged. Data The data file for exploratory data analysis and regression are operated by the file Blitz Employee Survey Data and for the time series data the time series data file. The three types of analysis are carried out in their respective files. A sample of 48 employees are selected for data collection by random sampling method. Methodology For analysis of the employees data summary statistics and cross tabulation methods and scatter plots have been used. For the prediction purpose regression analysis is used and time series analysis is conducted by trend, seasonal indices and forecasting analysis. Results Topic 1 1. The first purpose of this study is get the association between job security and productivity. Job security is a categorical variable because 1 meant insecure, 2 meant secure and 3 meant very secure i.e. the data represents categories and productivity is a numerical variable. Productivity is taken as the dependent variable and Job security as the independent variable. From the comparative summary table it is seen that the people who are very secure are most productive followed by only secure and insecure. All the three categories shows little difference between productivity which is evident from the standard deviation and variance. The median and mode also gives the same interpretation as the mean. There is a little difference between the maximum and minimum values between all the three categories and therefore the range shows smaller values. The distribution of the productivity of insecure employees are little negatively skewed and platykurtic,are nearly symmetric and little platykurtic, negatively skewed and leptokurtic. The values of the First quartile representing the middle value of the lesser part of the distribution and Third quartile representing average of the greater part of the distribution and the interquartile range showing the middle 50 percent of the distribution are compared with the box plot analysis, it is seen that the more an employee is secure more is his production capacity since the mean values are higher and also as security tends to grow some of the employees may become less productivity because with increase in security feeling as scatteredness in data increases with more security as is evident from the box plot. However these results show that there is a relationship between productivity and job security which can be properly justified with further analysis. 2. The males employees seems to be a little more productive than the female employees with a greater mean and median although not the case in mode. But since mean and median are considered more reliable measure of average for this data set therefore the results corresponding to them is considered to be more justified. Also more consistency regarding productivity are seen more among the male employees than their female counterparts which is evident from the standard deviation and variance being little less for the male employees. Both the variables being symmetric and although leptokurtic can be approximated by a normal distributions with little modifications. From the interquartile range of the females being higher than the males and the box plot analysis that there is some variability between male and female production rate and a relation can be predicted between them. 3. Productivity being the main concern in this section it is has been analyzed which department are more productive using cross tabulation methods in Excel. The rows representing the departments and the columns representing the class intervals of productivity. The data represented the percentage of employees falling within each groups. Maximum number of employees are found in the Distn department with 90-100 productivity level, i.e. low productivity level making it the least productive department. Comparing the tables it can be said that the sales department is the most performing among the three with more productive employees. 4. The scatter diagram shows the association between two variables, here between the variables productivity and number of days absent. Here productivity is taken as the dependent variable on the y axis and number of days absent as the independent variable. The points in the scatter diagram show a bit dispersed points, therefore a trend line is fitted to the scatter diagram. From the trend line it is seen that there is a positive association between number of days absent and productivity but inspire of a linear relationship the trend equation shows a curvilinear relationship. 5. In this part of the analysis a prediction equation via regression is established between the variables productivity and unpaid overtime. The coefficient of unpaid overtime is -1.28 shows that there these two variables are negatively correlated, i.e. if an employee gives overtime without payment then his productivity is affected badly, the constant being 107.96. The value of R square being .65 shows that 65 percent of productivity is explainable via unpaid overtime however negatively. 6. From the residual plot it is seen that the equation is well fitted as the predicted values does not seem to deviate largely from the original values the deviations being less only with a few exceptions where the deviation seems to be greater than 2which can be treated as outliers. Also some of the predictions overestimate the original values while some of the predictions underestimate the original values showing both positive and negative deviations. However the absolute value of the residuals and the residual plot graph shows that the regression equation is a good fit. Topic 2 1. The productivity of an employee who works 10 hours of unpaid overtime is predicted to be 95.31 based on the regression equation. The productivity of an employee giving 10 hours overtime without pay can vary from 94.026 to 96.245 with 95% confidence level from his original productivity. The productivity of an employee giving 10 hours overtime without pay can vary from 89.743 to 100.528 with 95% confidence level from his predicted productivity of 95.31. From the data it is seen that the maximum number of hours of unpaid overtime is 12 and less than that. An overtime of 20 hours is much a data outside the original data range. However as the fit of the regression line was good therefore it may be used to predict the data. 2. From the scatter diagram of productivity vs. job satisfaction it is seen that there is a positive relation between productivity and job satisfaction, i.e. as job satisfaction increases productivity also increases because the scatter diagram shows a positive trend. Also the scatter diagram between weekly salary and productivity shows that both the variables are positively associated the more the weekly salary the more is the production. A correlation matrix is created between the variables showing positive correlation between productivity and Age, jobsatisfaction,weekly salary and gender and negative correlations between productivity and unpaid overtime. A strong negative correlation is observed between unpaid overtime and jobsatisfaction,weekly salary and positive correlations are found between weekly salary and age, job satisfaction. From the correlation matrix most of the strong correlations between productivity are job satisfaction, weekly salary and unpaid overtime . However due to strong correlation between job satisfaction and unpaid overtime , i.e. one of the variable is sufficient for prediction , the variable job satisfaction is replaced by gender and therefore the final selected variables are weekly salary , unpaid overtime and gender. 3. The regression equation is Productivity = 102.51 + .003*Wk Salary -.953*UovTime +.446*Gender The model seems to be a good fit as the level of significance is less than .05 and therefore the equation is a good predictive model. The significance value of weekly salary is less than .05 showing that the variable is a relevant predictor variable, the significance level of UovTime is much less than .05 also showing that it is a relevant predictor variable but the significance level of gender is greater than .05 showing that it is not a relevant predictor variable. The value of R square being .711 shows that 71% of productivity can be explained by these predictor variables(Chatterjee Price, 1977). The residual analysis shows that the predicted values shows some deviation from the originals as is seen from the value of the residuals showing both positive and negative deviations. From the standard residuals it is found that three values are over exceeding the original . Also from the residual plot it can be justified that the regression equation is a good fit because of the proximity between the predicted and the observed values. 4. From the regression model the productivity of a 35 year old male employee with an overtime without pay for 8 hours earning thousand dollar and a job satisfaction rating 15 is 98.07. The productivity of a 35 year old male employee with an overtime without pay for 8 hours earning thousand dollar and a job satisfaction rating 15 can vary from 97.29 to 98.85 from his original productivity and the productivity of a 35 year old male employee with an overtime without pay for 8 hours earning thousand dollar and a job satisfaction rating 15 can vary from 93.12 to 103.01 from his predicted productivity of 98.07 at 95% confidence interval. Topic 3 1. This part of analysis represents the time series analysis. A three monthly moving average is first calculated from the total quantity of kids items sold for 24 consecutive months followed by a five monthly moving average and a four year centered moving average. A comparison of the three moving averages is given in the chart showing graphical plot of all the three moving averages. The lines show little variation from each other which signifies the three moving averages smoothes the data almost evenly. The smoothing proved helpful because the graph shows evenness throughout the series in the diagram. After the moving average method of smoothing an exponential smoothing is conducted for the data with = 0.2, = 0.3 and = 0.9. From the plotted series The MAPE values give small error rates for all the three values of = .2, .3 and .9. with highest error rate with = .2 and lowest error rate with = .9. 2. A trend analysis has been observed for the quarterly data of sales. A linear trend line has been fitted to the data with the help of regression analysis and average percentage error has been calculated for all the series. The values for the next four quarters was estimated by the trend equation and it was compared to the given four values using the mean absolute percentage error. A slight deviation has been found between the values with a MAPE of 16%(Ostrom, 1990). 3. In a multiplicative model Yt = Trend (Tt)*Seasonal Fluctuations(St)*Irregularity(It) For this method at first the data is smoothed four quarterly centered moving averages. Next the normalized seasonal indexes has been calculated and then the deseasonalized values. From the deseasonalized values the trend line has been fitted. Mean Average percentage error have been calculated to compare the trend fitted values with the given forecasted values. The MAPE was 17%. The MAPE calculated from the trend values was 16% and the MAPE calculated from the multiplicative model was 17%, therefore the linear model was a slightly better forecasting model. Conclusion The concern of the management of the company was regarding productivity of the employees due to different factors. From the analysis it is found that all the variables had their effect on productivity. But the most effective variables were selected for the regression analysis. Also from the compact time series analysis future forecasts can be made easily. References Chatterjee, S., Price, B. (1977). Regression analysis by example. New York: Wiley. Ostrom, C. (1990). Time series analysis. Beverly Hills, Calif.: Sage Publications.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.