python - Using IMDB data for the sci-kit regression models package which has text values in feature variables -
i have csv file containing imdb movie ratings data. file has 27 features , 1 target variable. have attached sampledata. , data set can downloaded kaggledata. have learnt sklearn package of python requires data in numbers. how use data regression analysis? right have used below code, says "some director name" can't converted float.
import pandas pd sklearn.linear_model import linearregression df = pd.read_csv('d:\machine learning\final\movie_metadata.csv') feature_cols = [ "director_facebook_likes", "cast_total_facebook_likes", "movie_facebook_likes", "facenumber_in_poster", "gross", "num_critic_for_reviews", "num_voted_users", "num_user_for_reviews", "duration", "title_year", "content_rating", "budget", "director_name"] x = df[feature_cols] y = df.imdb_score lm = linearregression() lm.fit(x, y) print (lm.intercept_) print (lm.coef_)
the simplest pd.get_dummies(). may come across one-hot-encoding.
Comments
Post a Comment