Travel Tips
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
利用Python进行数据建模--分类
在距离空间里,如果一个样本的最接近的k个邻居里,绝大多数属于某个类别,则该样本也属于这个类别
电影分类 / 植物分类
import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline
from sklearn import neighbors # 导入KNN分类模块 import warnings warnings.filterwarnings('ignore') # 不发出警告 data = pd.DataFrame({'name':['北京遇上西雅图','喜欢你','疯狂动物城','战狼2','力王','敢死队'], 'fight':[3,2,1,101,99,98], 'kiss':[104,100,81,10,5,2], 'type':['Romance','Romance','Romance','Action','Action','Action']}) print(data) print('-------') # 创建数据
knn = neighbors.KNeighborsClassifier() # 取得knn分类器 knn.fit(data[['fight','kiss']], data['type']) print('预测电影类型为:', knn.predict([[18, 90]])) # 加载数据,构建KNN分类模型 # 预测未知数据
预测电影类型为: ['Romance']
plt.scatter(data[data['type'] == 'Romance']['fight'],data[data['type'] == 'Romance']['kiss'],color = 'r',marker = 'o',label = 'Romance') plt.scatter(data[data['type'] == 'Action']['fight'],data[data['type'] == 'Action']['kiss'],color = 'g',marker = 'o',label = 'Action') plt.grid() plt.legend() plt.scatter(18,90,color = 'r',marker = 'x',label = 'Romance') plt.ylabel('kiss') plt.xlabel('fight') plt.text(18,90,'《你的名字》',color = 'r') # 绘制图表
data2 = pd.DataFrame(np.random.randn(100,2)*50,columns = ['fight','kiss']) data2['type'] = knn.predict(data2) print(data2.head()) print('------') # 创建数据,并调用模型预测
plt.scatter(data[data['type'] == 'Romance']['fight'],data[data['type'] == 'Romance']['kiss'],color = 'r',marker = 'o',label = 'Romance') plt.scatter(data[data['type'] == 'Action']['fight'],data[data['type'] == 'Action']['kiss'],color = 'g',marker = 'o',label = 'Action') plt.grid() plt.scatter(data2[data2['type'] == 'Romance']['fight'],data2[data2['type'] == 'Romance']['kiss'],color = 'r',marker = 'x',label = 'Romance') plt.scatter(data2[data2['type'] == 'Action']['fight'],data2[data2['type'] == 'Action']['kiss'],color = 'g',marker = 'x',label = 'Action') plt.legend() plt.ylabel('kiss') plt.xlabel('fight') # 绘制图表
from sklearn import datasets iris = datasets.load_iris() print(iris.keys()) print('数据长度为:%i条' % len(iris['data'])) # 导入数据 print(iris.feature_names) print(iris.target_names) #print(iris.target) print(iris.data[:5]) # 150个实例数据 # feature_names - 特征分类:萼片长度,萼片宽度,花瓣长度,花瓣宽度 → sepal length, sepal width, petal length, petal width # 目标类别:Iris setosa, Iris versicolor, Iris virginica.
df = pd.DataFrame(iris.data, columns = iris.feature_names) # 将特征值转为Dataframe df['target'] = iris.target ty = pd.DataFrame({'target':[0,1,2], 'target_names':iris.target_names}) df = pd.(df, ty, on = 'target') # 数据转换 knn = neighbors.KNeighborsClassifier() # 取得knn分类器 knn.fit(iris.data, df['target_names']) # 建立分类模型 pre_data = [[0.1, 0.2, 0.3, 0.4]] print('预测结果为:', knn.predict(pre_data)) # 预测结果 df.head() # 显示数据
Sed ac lorem felis. Ut in odio lorem. Quisque magna dui, maximus ut commodo sed, vestibulum ac nibh. Aenean a tortor in sem tempus auctor
December 4, 2020 at 3:12 pm
Sed ac lorem felis. Ut in odio lorem. Quisque magna dui, maximus ut commodo sed, vestibulum ac nibh. Aenean a tortor in sem tempus auctor
December 4, 2020 at 3:12 pm
Donec in ullamcorper quam. Aenean vel nibh eu magna gravida fermentum. Praesent eget nisi pulvinar, sollicitudin eros vitae, tristique odio.
December 4, 2020 at 3:12 pm
我是 s enim interduante quis metus. Duis porta ornare nulla ut bibendum
Rosie
6 minutes ago