常用命令

  • celery 启动

celery -A worker worker -l debug -Q japan --concurrency 1

  • 查找文件

find . -name "*.log"|xargs rm -rf

  • 认证登录

/usr/local/mongodb/bin/mongo --port 28017 airline --authenticationDatabase=admin -u root -p 123456

  • Mongostat 统计

/usr/local/mongodb/bin/mongostat --port 28017 --authenticationDatabase=admin -u root -p 123456

  • find语句 explain分析

db.lcc.find({"segments.grabChannel":9113}).explain("executionStats")

  • 常用业务

db.lcc.find({ "timestamp":{"$gte": 1488470400}, "segments.grabChannel":9113, "segments.origin": "BJS", "segments.destination": "HKG"})

  • 导出语句mongoexprort json格式

/usr/local/mongodb/bin/mongoexport -d lcc -c lcc -o lcc.json --type json --authenticationDatabase=admin -u root -p 123456 --port 28017 --query '{"segments.grabChannel":9113, "segments.origin": "BJS", "segments.destination": "HKG"}'

  • 查询+sort

db.lcc.find({"segments.grabChannel":9113, "segments.origin": "BJS", "segments.destination": "HKG"}).sort({"segments.departureDate.month": 1})

  1. [ ] 启用硬盘

, {'allowDiskUse': true}

  • [ ] 查询出发时间的所有count
db.lcc.aggregate([
            {
                $group: {
                    _id: {
                        "year": "$segments.departureDate.year",
                        "month": "$segments.departureDate.month",
                        "day": "$segments.departureDate.day"
                    },
                    total: {
                        $sum: 1
                    }
                }
            },
            {
                $sort: {
                        "_id.year":1,
                        "_id.month":1,
                        "_id.day":1,
                    }
            },
            { $out: 'departureDate_col3' }
        ],
        {
                       allowDiskUse: true,
              explain: true
                }
    )
  • mongoexport导出 指定字段

/usr/local/mongodb/bin/mongoexport -d lcc -c departureDate_col -o date_search.csv --type csv --authenticationDatabase=admin -u root -p 123456 --port 28017 -f _id.year,_id.month,_id.day,total

  • 10.聚合分析,启用硬盘
db.lcc.aggregate([
            {
                $group: {
                    _id: {
                        "year": "$segments.departureDate.year",
                        "month": "$segments.departureDate.month",
                        "day": "$segments.departureDate.day"
                    },
                    total: {
                        $sum: 1
                    }
                }
            }
         ],
                {
                      allowDiskUse: true,explain: true
                }
)
# 统计每日采集的各渠道 采集数量

db.lcc.aggregate(
[
{
    $match:
        {'timestamp':{$gt:1490930153}}
},
{
  $project:{
        'segments.grabChannel':1,
        '_id':0
  }  
},
{
    $group:{ '_id':'$segments.grabChannel',
             'sum':{$sum:1}
    },
},
],
{
  allowDiskUse: true 
}
)
  • 运行debug scrapy

scrapy crawl airticket_v2 --logfile log_airticket.log -L INFO

  • [ ] 12: #创建表上限: 250GB、 1kw条数据

db.createCollection("lcc", { capped : true, size : 268435456000, max : 10000000 } )

# hutchgo 欢乐购航空

# 要求数据格式必须是 user:pwd@ip:port

  • [ ] 13:redis push 代理

lpush proxy:hutchgo test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080

lpush proxy:mytrip test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080 test:[email protected]:8080

  • [ ] 14.

/usr/local/redis/bin/redis-cli monitor

results for ""

    No results matching ""