UserID LabelID Prior Likelihood Posterior
1 1 71/206 15/71 .07
1 2 27/206 2/27 .009
1 3 108/206 1/108 .004
2 1 71/206 21/71 .101
2 2 27/206 15/27 .07
2 3 108/206 7/108 .03
3 1 71/206 35/71 .169
3 2 27/206 0/27 0
3 3 108/206 100/108 .485
4 1 71/206 0/71 0
4 2 27/206 10/27 .04
4 3 108/206 0/108 0
Posterior = Prior * Likehood(category)
71/206 * 15/71 == .07
Tuesday, June 1, 2010
Prior and Posterior probability
Wednesday, April 28, 2010
淘宝商城情况和定位,淘宝CFO商城主管张勇
1:淘宝商城做B2C,相当于一个设防的经济特区,
2: 淘宝商城的情况:20个淘宝商城(分类),
3:商家/ 企业在淘宝商城的定位很重要,“1.
* 数据挖掘可以做的事情:分析淘宝用户的购物需求,
4: 电子商务非常重要的问题是后台问题,即电子商务的解决方案,
5: 企业在淘宝商城战略上的资源配置
6:补货
* 数据挖掘:应该要能够很好的预测到可能的销量,
7: 淘宝商城一年年收入增长500%
Wednesday, April 21, 2010
Hive 安装过程
淘宝数据平台师兄的介绍
我记录下我在安装Hive时候遇到的问题,以便后来者能够借鉴之
首先我考虑的是官方的tutorial
http://wiki.apache.org/hadoop/Hive/GettingStarted#Hive_introduction_videos_From_Cloudera
$ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive
$ cd hive
$ ant package
$ cd build/dist
$ ls
README.txt
bin/ (all the shell scripts)
lib/ (required jar files)
conf/ (configuration files)
examples/ (sample input and query files)
但是发现在ant的时候,一直出现 ivy:retrieve .....的提示,我估计是从网上需要下载东西,后来仔细看了下install 过程的提示发现了
[ivy:retrieve] downloading http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hadoop-0.17.2.1/hadoop-0.17.2.1.tar.gz ...在中华大局域网下,你想从facebook下东西? 先翻墙
因为我有SSH,我用了proxychains,天真的把安装程序丢到proxychains中间去就以为能够万事大吉,
proxychains ant package结果错了,还是出现这种问题。
我不知道是proxychains无能,还是别的什么我没想到的配置
最后,只好作罢,树挪死人挪活
想起淘宝数据平台博客(我暑假就要去淘宝实习了,也是这个部门,师兄的作品呢)有安装步骤,
淘宝数据平台
果然
Hive 的下载配置安装
请参考入门指南, 这里给出最基本的提纲:
* 安装配置 Hadoop。
* 安装配置数据库(mysql 等)。
* 获得 Hive 源码或者可执行代码。wget http://www.apache.org/dist/hadoop/hive/hive-0.5.0/hive-0.5.0-bin.tar.gz
* tar xzf hive-0.5.0-bin.tar.gz
* cd hive-0.5.0
* 配置 Hive 如何访问数据库,如何访问 Hadoop。
* 运行 Hive。
当看到 Hive 提示符‘Hive>’的时候,恭喜,你可以开始你的 Hive 之旅了。
最后按照此方法下载bin source code ,tar,设置了$HADOOP_HOME
最后 done
hive>>
Wednesday, March 31, 2010
不会“思维”只会“批判”,谨防网络舆论“怨妇化”
2010年02月26日 02:27 来源:侨报 作者: 南桥 【大 中 小】
Wednesday, November 25, 2009
3 dimensions on Behaviral targeting
* CRM Dimension : Customer Retention
* Branding Dimension : Brandwashing
* Direct response : Customer Acquisition
Tuesday, October 27, 2009
DataMining Tools
Wake
R
Excel
what funny is follow...
A;
SAS Base does the great job there. SPSS Modeler as well. SPSS Statistic trial is available at http://www.spss.com.
Java is cool. But you are wasting time with programming. Data-miner has more important task to do than generating tons of code.
Doesn’t (s)he?
B
I disagree for two reasons:
- I cannot count the algorithms I can give you a understandable description of in 1 minute, but when it comes to a real data analysis you will meet special cases where you have to know EXACTLY how this algorithm is implemented. That is the reason I could never work with non-open source programs
- If you are not able to write code (at least for changing the behavior of present algorithms or create new ones) you restrict yourself to use only what’s available. Are you sure your data mining environment is prepared for every possible data analysis problem ?
@tools: You forgot RapidMiner (former Yale) which does an excellent job in handling large datasets and data preparation (its key focus). It is free, it is open source and it is written in java.
"I disagree for two reasons:
- I cannot count the algorithms I can give you a understandable description of in 1 minute, but when it comes to a real data analysis you will meet special cases where you have to know EXACTLY how this algorithm is implemented. That is the reason I could never work with non-open source programs
- If you are not able to write code (at least for changing the behavior of present algorithms or create new ones) you restrict yourself to use only what’s available. Are you sure your data mining environment is prepared for every possible data analysis problem ?
@tools: You forgot RapidMiner (former Yale) which does an excellent job in handling large datasets and data preparation (its key focus). It is free, it is open source and it is written in java."
- Data into results » Data mining tools (view on Google Sidewiki)
Clusters on Twitter users
using kmeans method to extract the clusters
comtain : common people , geek,profession manager,online addict.any more..
also supply a result
http://www.dataintoresults.
"A Twitter users segmentation"
- Data into results » A Twitter users segmentation (view on Google Sidewiki)