简易中英小词典iDict v1.0发布[Github开源]

时间：2018-01-28 ┊ 阅读：20,899 次 ┊ 标签: 开发 , 编程 , 设计

用wpf写了一个小词典，方便自己使用。

功能：

简易查词
最小化系统托盘
设置开机启动

词典是用python爬的iciba词库，基本够用了。
下面是python抓取词典源码，现学现用。python上手就是简单。

# -*- coding:utf-8 -*-
import re
import time
import codecs
import socket
import urllib.request

socket.setdefaulttimeout(60)


def grabDict(strWord):
    strWord = strWord.strip()
    strWordUrl = urllib.parse.quote(strWord)
    boolRetry = True
    while boolRetry:
        try:
            response = urllib.request.urlopen('http://open.iciba.com/huaci/dict.php?word=' + strWordUrl)
        except:
            print("time out happen, wait 5 seconds...")
            time.sleep(5)
        else:
            boolRetry = False
    pattern = re.compile(r'<[^>]+>', re.S)
    regexStr = ".*?(<[\u4E00-\u9FA5]+>+)"
    regexBlk = "\\+s"
    for line in response:
        strLine = line.decode('utf-8')
        strLine = strLine.strip()
        if strLine.startswith("dict.innerHTML"):
            strLine = strLine.replace('\\"', '"')
            strLine = strLine.replace('\\\'s', "'s")
            # strLine = strLine.replace(strWord, '', 1)
            strLine = strLine.replace("dict.innerHTML='", "")
            matchStr = re.match(regexStr, strLine)
            while matchStr:
                matchWord = matchStr.group(1)
                replaceWd = matchWord.replace('<', '〈')
                replaceWd = replaceWd.replace('>', '〉')
                strLine = strLine.replace(matchWord, replaceWd)
                matchStr = re.match(regexStr, strLine)
            strLine = strLine.replace("生词本</a>", "")
            strLine = strLine.replace("详细释义</a>", "")
            strLine = strLine.replace("';", "")
            strLine = strLine.replace("]</strong>", "]~^~")
            strLine = strLine.replace("</p>", "~^~")
            strLine = strLine.replace("~^~；", "；")
            strLine = strLine.strip()
            strLine = pattern.sub('', strLine)
            # strLine = ''.join(strLine.strip().split())
            strLine = strLine.replace("\t", "").strip()
            if strLine[len(strLine) - 3: len(strLine)] == '~^~':
                strLine = strLine[0:len(strLine) - 3]
            if strLine[0:3] == '~^~':
                strLine = strLine[3:len(strLine)]
            strLine = strLine.replace('〈', '<')
            strLine = strLine.replace('〉', '>')
            strLineTmp = ""
            for strSec in strLine.strip().split("~^~"):
                strLineTmp += strSec.strip() + "~^~"
            if strLineTmp[len(strLineTmp) - 3: len(strLineTmp)] == '~^~':
                strLine = strLineTmp[0:len(strLineTmp) - 3]
            # strLine = strLine.replace("                    ", "")
            strLine = re.sub(r'\s+', ' ', strLine)
            if strLine.startswith(strWord):
                strLine = strLine.replace(strWord, '', 1)
            strLine = strLine.strip()
            return strLine


# strDict = grabDict("斯")
# print(strDict)
file_in = open('iDict.bin')
idx = 0
for linef in file_in:
    strLineWord = linef.split('\t')[0].strip()
    strUrlWord = grabDict(strLineWord)
    if strUrlWord == "以上为百度翻译结果":
        strLineFull = strLineWord + '\t' + linef.split('\t')[1].strip() + '\r\n'
    else:
        strLineFull = strLineWord + '\t' + strUrlWord + '\r\n'
    file_ot = codecs.open('iDict_new.bin', 'a', 'utf-8')
    file_ot.write(strLineFull)
    file_ot.close()
    idx += 1
    print('we have grabbed ' + str(idx) + ' words.')
    time.sleep(0.2)
file_in.close()

词典app是用C#编写，基于WPF视图，源码有空我放到github上。

本文固定链接: https://www.amkevin.com/301.html｜冰峰雪晴｜转载请注明出处,谢谢

冰峰雪晴

简易中英小词典iDict v1.0发布[Github开源]

文章评论

仅有1条评论

添加新评论

相关文章

ValueError: Error getting directory

Introduction to ILE RPG Activation Groups

popup.js怎么和content.js通信？[JQuery]

如何绕过登录抓取js动态加载网页数据[Python]

终于用上了专业版的PyCharm含激活方法链接[Python]